Thursday, August 30, 2012

Greasemonkey API Usage -- August 2012

Back in November 2009, I analyzed the API usage, and a few other aspects, of all the scripts on userscripts.org, then 36,141 scripts.  I was directly discussing some of the topics that were already bubbling around the back of our minds, for how to carry Greasemonkey into the future with us.  The short version is: web browsers and web apps are getting so much poweful, why do we need these one-off Greasemonkey APIs with cross-browser problems?

Now that Greasemonkey 1.0 is out, we've made big steps made in that direction, and I've repeated the analysis.  I downloaded (with permission from the site owner) every single active script on userscripts.org, now 82,084 scripts.

First up, which API calls are made, and how common are they?
API Usage by Number of Scripts

Not much has changed.  By far most common is that a script doesn't call any special APIs (57.94%).  Then, GM_getValue/GM_setValue are still right up there.

The biggest change is that unsafeWindow usage has jumped from 6th to 2nd place (12.44% to 17.65% of scripts; 1,527 scripst also mention wrappedJSObject, not on the chart).  Authors want to interact with the page in ways that the security sandbox (which protects these APIs) prevent, so they explicitly jump out of the sandbox, bringing vulnerabilities with them.  Of the 14,484 scripts that reference unsafeWindow, 6,494 of them use no other Greasemonkey APIs, and thus are served well by moving towards a model where there is no sandboxing (nor APIs, unless you ask for it/them).  Another 838 scripts only use GM_log and/or GM_addStyle, which can easily be replaced with console.log() or the compatibility shim layer.  It gets hard to analyze other calls in more detail, but I see a lot of get/set value calls, which (assuming you run on only one domain) can also be well served by DOM Storage.

Moreover, those 47,557 scripts that don't use any of the special APIs are still saddled with the sandbox and its pitfalls, known and unknown before Greasemonkey 1.0.  Plenty of newer browser features don't work in the security sandbox because its entire point is to separate the content scope (where these features work) from the script scope (with its smaller set of privileged features).  A huge part of the design changes in Greasemonkey 1.0 is to make the default behavior, like these majority of scripts need/want, to run as close to possible as a regular script in a regular web page, without surprises like missing values and broken features.

So do scripts that use get/set value or xmlhttpRequest really need the cross-domain behavior they provide?
API Usage Cross-domain Analysis
(Note: the left-most set of bars is "@include *" and the rightmost is ">5" -- the labels are missing from the graph and I'm not sure why.)
Mostly: no, and this hasn't changed much since 2009.  The vast majority of scripts using get/set value (71.86%) only ever execute on a single domain, and thus can use DOM Storage with no ill effects.

The XHR usage to two domains is lower mostly because I fixed my analysis a bit (i.e. not counting an @include of *.example.com and an XHR to www.example.com as two domains, and not counting XHRs to userscripts.org, assumed to be update checker scripts, which is now provided by Greasemonkey).  However, a combined 44.25% of scripts that call XHR (and with a string literal that I could pull a domain name out of, not a variable set somewhere else) either call to/run on two or all domains, and thus really use the cross-domain power of GM_xmlhttpRequest.

Finally a bit more detail about Metadata imperatives.  This graph is for all imperatives used in at least 1% of scripts, regardless of what they are.
Greasemonkey Metadata Imperative Usage
Most of what has changed since 2009 is the analysis, including more values.  Note that almost every script (99.37%) specifies @name, and we see a power law trail off in usage.  The commonly used, but unsupported in Greasemonkey, ones are @author, @homepage, @license/@copyright, @date, and @history.

Check the raw data to see hundreds more @things, generally all unsupported values.  And there I pasted only those used at least ten times, there are yet more hundreds used fewer times.

To those that are interested: the script that I used to generate these numbers is available for inspection, in case it perhaps contains a serious bug. The raw data that I generated with it, and the charts above, are also available to check.

2 comments:

nascent said...

Fascinating post, thanks for the information.

I am curious about "GM_Log can easily be replaced with console.log()"
If this is a redundant API, why not just call Console.log() with GM_Log?

Also, you mention that scripts using xmlhttpRequest don't need cross domain behaviour, and can can "use DOM Storage with no ill effects."

Does this mean there's a more efficient way to achieve xmlhttpRequest for a single domain?

It'd be cool if there was a post explaining best practises.

arantius said...

> If this is a redundant API, why not just call Console.log() with GM_Log?

That sentence doesn't make a lot of sense. But we're not removing it just to break old scripts, there's little to no cost to leave it in Greasemonkey. The point is that if your script does, today, call GM_log, you can trivially change it to call console.log(), and thus not need that special Greasemonkey API.


> Also, you mention that scripts using xmlhttpRequest don't need cross domain behaviour, and can can "use DOM Storage with no ill effects."
> Does this mean there's a more efficient way to achieve xmlhttpRequest for a single domain?

I said that _if_ you don't need cross-origin behavior, then you can use the standard browser features (within a single origin) and get the same functionality. I linked the DOM storage docs, and XHR is the obvious page: https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest