This blog is focussed on personal computing, but the artefacts we sit down and use everyday are impacted by changes elsewhere in the tech ecosystem. I’ve spent the last 5 years working in the blockchain space, and I think it’s maturing to the point where we can start to see its impact on our computing environments. This post expands on a Twitter thread.
Note that I use ‘blockchain’ throughout as an umbrella term for a lot of interesting work on provable data structures, distributed protocols, smart contracts, and the like. Early on, some of us tried to popularise ‘distributed ledgers’ the umbrella term, but it doesn’t seem to have stuck. However, I still see disagreements between people who are using the ‘blockchain-the-umbrella’ or ‘blockchain-the-data-structure’. So it goes…
I sometimes see an argument along the lines of “blockchains were designed to decentralise technology, but if we’re not careful big organisations will use it to centralise even more”. I don’t think this is quite right: in my opinion blockchain technology will increase both centralisation and decentralisation at the same time.
I think this is actually true of all communications technology as it drops transaction costs. Roy Bahat has a really interesting article pointing out that companies in the US are getting both bigger and smaller since 2000. The average company is smaller (especially when you only measure those under 500), but take an employed person at random and they’re likely to work at a larger organisation. It’s polarisation — there are more small companies than before, but the big ones are a lot, lot bigger. As Roy puts it:
As those transaction costs go down — reputations are shared online, so people trust each other, marketplaces help you find ever-more granular business services, communication becomes free — the corporation could disassemble into its components, with one small firm (or even a person) doing each task and coordinating with others.
The trick is that those same falling transaction costs can also make it easier to do business within a company, too (Intranets instead of employee manuals, Slack instead of a phone tree, etc.) — which might make some companies bigger as they’re capable of managing bigger staffs.
I think looking at the impact of blockchain with that in mind is valuable. The first question is decentralising what exactly? Simon Wardley recently pointed out to me that decentralisation of infrastructure can happen independently of decentralisation of power: if Amazon launch delivery drones, it may be decentralised in terms of each drone acting on its own local knowledge and goals, yet the fleet as a whole is owned by Amazon for its profit. I personally believe the decentralisation of infrastructure is somewhat inevitable: the more interesting question is the decentralization of power.
As I see it, in the decentralisation of power camp there are some really interesting strands of work going on in the around incentivising coordinated behaviour (e.g. mechanism design) and community governance (e.g. Elinor Ostrom’s work). With suitable technology behind them — for example, encoding governance in smart contracts — future decentralised communities will act with a purpose and coherence that previously was only achievable by the best organisations.
At the other end, I think the super-organisations of tomorrow are going to look very different from those of today. I think they’ll be open platforms supporting large ecosystems, with a hazy line between the core organisation and the partners and customers around it. Simon Wardley has sometimes referred to the process of ‘gardening’, which I think is apt — tending a platform ecosystem like planting, weeding, and watering. Key is that these platforms will be enabled by exactly the same technology that underpins the decentralised platforms: the only difference will be the power structures behind them.
A question remains: which industries, companies, or use cases will end up being decentralised and which centralised? I don’t think it’s as easy as saying all large organisations will get larger. It's likely that certain problems will be better suiter to one model or the other. My guess is that organisations will need to keep moving up the value chain as that anything that looks like it should be common infrastructure will be targeted by decentralised communities.
The really interesting thing is when you step back and look at this polarisation as a user — it might not really be possible to tell one end from the other. The large decentalised efforts will have sophisticated incentive alignments and governance (and branding), and the super-organisations will be amorphous and fluid. I’m sure there’ll be a lot of horizontal transfer of good ideas. There are a few regulatory changes which give incorporated entities the edge for now (e.g. corporations have coordination rights denied to non-corporations) but it's possible for that to change in time. Perhaps it’s less two ends of a spectrum and more like separate paths toward some unified future state. Should be a fun ride…
Last month saw the news that Microsoft was shifting Edge’s rendering engine from its proprietary one to Chromium. Opinion on Twitter seemed to be split between joy at Microsoft finally embracing an open-source, standards compliant engine, or horror at the thought of browsers becoming an anti-competitive monopoly controlled by Google — a repeat of the dark days of IE6. It was certainly not great timing for a Chromium vulerability (via SQLite) to be dsicovered.
Sad to see Microsoft throw in the towel on their own browser rendering engine. The web doesn't benefit when developers are encouraged to "just test in Chrome" through consolidation. We need a strong, diverse set of browsers. HANG IN THERE FIREFOX! https://t.co/9DxIOPich3
I’m certainly disappointed by this news, but I think that the reality is that the Web has been stuck in a monoculture and it shows in the current Web standards. This was Mozilla's response, and I respect what Mozilla stands for, but when you look at the desktop browser chrome there’s almost nothing to tell them apart. Tabs, an address bar (perhaps combined with a search bar), bookmarks, and history. Interaction models are largely identical, and have been since Firefox introduced tabbed browsing in 2002.
Chrome OS, which had the opportunity to reimagine what a browser could look like when given the whole environment to play with, chose instead to remain identical to its "hosted" counterpart and instead recreate (or even expose with its new containerized Linux) a limited version of a 90s operating system around it.
Chrome OS Linux apps will soon be able to access your entire Downloads folder and Google Drive (all they need now are browser toolbars and DLLs and the security story will be complete) https://t.co/36QFiM9TbH
You might argue that browser chrome is distinct from the engine underneath it and that’s true to an extent, but Web clients and their place in a modern computing environment determine what features should be standardized and supported by the engines. If your notion is that a browser is just another application that should be subservient to the machinery and affordances of the host environment that its running in, you’ll favor standards that are familiar over ones which are actually more native to the Web. The growing popularity of Electron and calls for that functionality to be pulled back into the browser is a particularly noticeable result.
However, the Web was always more radical than that. It had its own interaction models around hypermedia, and was flexible in a way that current systems still aren’t. User installed style sheets and scripts (keep going, Greasemonkey!), Web feeds for users to consume information how they want (RIP browser feed managers) or even combine and extend them (RIP Yahoo Pipes), the Semantic Web, warts and all, and its vision of agents ranging over data sources, compiling custom views on behalf of the user. All of it sidelined. Even the broad principle of ReST architectures allowing loosely coupled, evolvable clients has evolved out into something weaker — proprietary clients (either in-browser or native clients using things like WKWebView) delivered with hard coded, versioned endpoints. Instead of this grand vision, an identical set of browsers that vie to deliver increasingly opaque “Web Applications” to the user with perfect fidelity.
My mental model is that the Web sat on the cusp between 2 peaks — one was the 90s desktop model, and the other is a Web-native, decentralized hypermedia peak, not fully explored, yet has the potential to be much richer and more powerful. In the early days it looked like the Web was rising off towards the higher peak but instead, inevitably, it’s been dragged into the safe, well-explored one.
So this is why I think the Web is dead — it has lost its way. As long as the current institutions are in charge of the browsers and the standards committees its current path has fundamentally restricted how powerful the Web can be in the future. And I don't see a change in those institutions any time soon. The standards are large and sprawling, and it requires a significant investment to experiment in this space (not even Microsoft can compete) with little direct finacial gain.
The other would be top-down: increase competition in the browser space. Hopefully we'd stumble on powerful models for clients which take advantage of the native properties of the Web, rather than fight against it. Even outside of the engine, browsers are large, complex things, but perhaps we can modularize and package them in interesting ways. I particularly like this take from Patrick Walton talking about engines, but applied to browsers as a whole:
What if we thought of browser engines less as monolithic entities and more as distros of the various libraries that comprise the Web platform?
Many years ago I read The Humane Interface by Jef Raskin which completely upended my notion of computing. It showed a vision of a world of computing which was radically different from the prevailing paradigms. Until then, I hadn't even realised that it was even possible to question core artefacts like files or applications.
Jef lead a team to make that vision a reality in the form of Archy.
Archy never quite fulfilled its grand ambition, but its legacy continued in the forms of Enso (a Windows-focused implementation), and then Ubiquity (for Firefox).
HC: I wanted to start with the genesis of Archy. At what point did it coalesce and come together as a project?
AR: It’s been so long that I have to sort of dredge through the murky, muddy water of memory. So after Jef published his book, The Humane Interface, there was a strong desire to take many of those concepts that Jef had continued to work on around what are cognetics and the ergonomics of the mind and make a utility product that really worked the way our minds did. Many of them had first come to light in the Canon Cat after the Mac, before getting shut down by Canon when an electronic typewriter failed. They had put the Canon Cat into the same conceptual bucket as an electronic typewriter — just shows you what the thinking was like at that time. After selling something like 20,000 units they shut it down.
– A scan of a Canon Cat advert by Marchin Wichary (source)
Archy really got going in my junior year of college, so that was 2003 when we started coding. We were working simultaneously on a contract for Samsung to redesign their phones and were thinking of doing it in a zooming concept and using that funding to fund the Raskin Centre for Humane Interfaces, RCHI, and turn that into the be-all end-all text editor, and that became Archy.
HC: Where did the team come from?
AR: I sort of imagine it to be like those 80s movies where the motley crew assemble to win the Super Bowl and has a sumo wrestler and clown, that was the kind of crew. People who had been in Jef’s orbit for a long time, like David Alzofon who did a lot of the original manual writing for — it may have even been the Apple II — an incredible technical writer. Something Jef said all the time was write the manual first, if you’re having trouble explaining how your product will be used, your users will have trouble using it.
HC: I feel that Archy as ended up was a partial implementation of a much bigger vision. What was the grand idea for what this could potentially be?
AR: Yeah, this was a turning of computing over on its head. The idea being that when you sat down at your computer it’s supporting the thing you need to be doing. Immediately your text editor was up and you could start working.
You can think of applications as walled cities — they have to develop all of their own infrastructure. If you’re making Photoshop it needs to have spell check because you have a text editor in there, and if you’re making Word you need to have Photoshop abilities because you’re putting in photos and you’ll want to edit them. Over time the applications have to continue to increase in size and subsume more and more functionality until every application starts to converge from different directions on the same kind of application.
If that is where bloat comes from, and if 95% of the feature requests for features in Word are features that already existed in Word, maybe it’s the application as a framework which is broken. You want to tear it apart and just have functionality that you can use anywhere.
There were no files, because the best label for a file is the file itself, the content of it. Everything was in one long, conceptual document, but we know that human beings work very well with spatial memory, so what you want is to have all of your content and work projects stored spatially. It’s supposed to be a full-on zooming user-interface (ZUI), so no matter where you are you can zoom out and grab your bearings, and zoom in ad infinitum, a much better way of doing folders and taxonomies.
Computers have this incredible magical power and text can do so much more than just a Word document can do. There’s a recent project called Observable by Jeremy Ashkenas, and Mr. Doob and while it’s a very different take I think starts to get towards a little bit of what that vision of Archy was. That magic of being able to program a little, have parts of your text talk to other parts of your text, to have your documents be really alive and you can just cast magic at it. That it starts as simple as being able to type characters, that it has incremental search so that you can move at the speed of your own thought, that there are very few hand gestures that are unnecessary, all the to any bit of text can refer to any other bit of text, and you can run any command and do any functionality from anywhere, that you could open up your tools and it was all coded in the same environment, that you were making in, very Smalltalk or Alan Kay-like, and so your tools themselves can be modified in real-time to modify themselves if need be. I think that’s a little of the vision of what Archy was supposed to be like.
HC: That’s fascinating. It seems there is that inspiration of lineage and indirectly drawing on the same resources. They are all modern environments which have little bits and pieces of that which feels like this resurgence of notebook-style editors.
AR: Exactly. Sort of like Jupyter but the next iteration. And all of these things start to point back to much older concepts, from HyperCard to even Xanadu.
HC: The Humane Interface and Jef’s work through the Canon Cat were big inspirations, but clearly your throwing around a lot of older projects as inspiration as well. Were there any others that you would say were concretely referenced as providing input?
AR: Yeah, originally we thought we might implement Archy on top of Smalltalk, and Scratch. I think it’s really interesting now that the next person picking up that mantle is Bret Victor now at Dynamicland. It’s a tangible version of very similar concepts.
HC: It’s interesting that you mention Smalltalk as it feels that there are similar ideas in there. For instance, no applications — you’ve just got content that you’re operating on.
AR: That’s exactly right. In some ways you can think of the command system of Archy as being just like Unix commands. You hold down a button (so it’s a quasimode, you start typing what you want to do, like ‘spell check this’. It knows what the input expects and what the output expects, so if you run it on text of course it spell checks the text, but if you run on an image it should run OCR and then do spell check on top of that. It should be able to ask the system ‘I have an image but I expect text, do you have anything that turns images into text?’ then it should all just happen behind the scenes.
HC: I can see that underlying similarity, and yet the concrete realisation of projects is obviously very different, you know, Smalltalk embraces iconic representation and involves heavy mouse usage, whereas the Archy system and what you’d built was almost a direct rejection of a lot to that.
AR: There’s certainly a very heavy keyboard focus because there’s so much emphasis put on thinking through the GOMS modelling, thinking through where errors happen and trying to minimise time and maximise information theoretic efficiency. The mouse was just not great for that.
HC: You’re quoted as saying that this was an environment of something that you’d like to boot up into, so maybe not an operating system per se, but essentially a complete computing environment. Am I correct in thinking that was the plan?
AR: Yeah absolutely. Imagine Chromebook-style. If we were re-implementing it now maybe we’d do in on top of the Web as a platform, and then you’re sort of done. You can boot up directly into it. That can be your world.
HC: So you were imagining hardware as well? From using it one thing I did feel was that the dedicated keys for LEAP and Command would certainly have been enormously useful.
AR: A LEAP key changes everything. If my memory serves we were thinking about a keyboard that has all of the software on it, so that you can walk around with your keyboard, plug it into a computer that would have Archy on a USB drive inside of it, and so the entirety of your text, history, preferences, everything would just come along with you. You just needed to carry your keyboard an plug it in.
AR: Yeah, we were really focusing on the text editing, maybe even the coding. If, conceptually, a spreadsheet, a document, and a code editor have a baby, what do you end up with?
Doug McKenna wrote a fully zooming map library behind the interface. One of the problems you end up having with infinite zooming spaces if you just do a naive implementation is that you can zoom in until you hit the end of the precision of floats and all of a sudden the ZUI just start to like bounce around. That requires some careful thought.
You might want a portal so that you zoom from one area into another — the equivalent of a symlink. Now you can teleport into another part of the ZUI, but when you zoom out you always want zoom out to be the equivalent of going back, like the back button in a browser. So you have to do careful hand-off of coordinate systems, a lot of really interesting caching problems, and so Doug wrote that for use with the cellphone prototype we were working on. We put a demo of how this might work, a Flash demo where you could zoom in and zoom out, see annotations on Web pages. Bret Victor actually got involved and he created a corollary zooming user interface prototype and we went back and forth with a couple of different iterations.
HC: There was some mailing list post from a while ago talking about the size of the project: ‘Archy turned out to be too big a project for what the small team were capable of. It was more complex given the number of specced features surrounding universal undo than any I know of.’ Can you elaborate on that?
AR: Oh yeah, it’s just really simply observations. Anything you do on the computer you should be able to undo. You don’t feel like you’re walking through landmines. If you do the wrong thing or touch the wrong button you won’t be able to recover from it. So the question is: if you are making an extensible system, how do you have it so that every application you can always get back to from the last step?
It’s something we want in real life, it’s the ability to take back the thing we just said, and at the very least we can do that on the computer.
But what does it mean to hit undo if you have a collaborative document that multiple people are editing, does hitting undo just undo yours, or does it undo everyone's? Should undo operate just in the document that I’m currently looking at, or does it undo the global thing?
There are a really interesting set of problems that come from a very simple thing which is the principle of being able to get back to where you were and keeping the environment always safe.
HC: Were there any big ticket items or anything glaring that you didn’t get on the roadmap?
AR: One of the largest problems was figuring out how this thing could really become your new environment to work in. How do you interoperate? You’re going to want to be doing all of your editing and coding inside of this thing, but the predominant form of doing that now are lots and lots of files. How do we interop between a world where there’s folders and files, to a world where there’s just one really long document or ZUI? Figuring out all of those bridges so that you could gracefully upgrade people from the current day to the Archy world. I think it was sort of an open-ended question: ‘How do we want to do that bridge’.
HC: Did you ever think about compromising, or was it the case that Archy was this very opinionated stance and other things were going to have to figure out how to fit into it?
AR: We were definitely thinking more of the “this is a different way of doing computing”, so we would rather figure out the way of doing it in our world. Even coding. We thought a lot about, that seemed to make sense in the zooming world.
So those were the big ticket thought items where we had first stabs and gestures at it, maybe even a couple of prototypes, but they were super-early stage. I think we all knew those were going to be a place to put a lot of conceptual work. Just trying things out and figuring out. It’s always through making that you shine a flashlight on the ideal between concept and implementation.
HC: Interesting that you mention that because I think whilst I understood the roadmap and all the pieces, but there were a few gaps where I didn’t quite see how it connected through, and one of them was the Web, and that left a bit of a question mark for me. The other one was creative media-style applications, audio workstations, video, drawing applications. In some places it seemed like that might be a hard conceptual gap to close.
AR: Yeah, I think the Web is a particularly interesting one, because our thought was wherever you have a link you would just embed the link, like the entire Webpage, as a thing next to a link that you could zoom in to. That’s the base layer, and you can start playing with that, and have something a little bit better, something new.
Drawing is much easier because it’s not an application, it’s not a thing you go to. It’s a set of tools that come to you. Any time, anywhere you’d be able to pull up the palette and just start drawing, whether it’s on a Web page or on a document that you’re typing. You can just draw on the canvas of the ZUI world in general. If you zoomed out you’d be near a set of photos from some trip, and you can just write using whatever photo editing or illustration tool you had, for instance, ‘My trip to Panama’, or just a giant heart. You can just modify the space.
That’s a big part of the shift, thinking about the nouns of the world, the substrate of the ZUI, of the text document, as just an object which you could call up different verbs to do whatever you want with them.
HC: I wanted to move on to the end of Archy and the beginning of Humanized and the work that you were doing there. Could you give us a quick recap of the circumstances surrounding that changeover?
AR: Around the time that Jef got very sick we also got a fairly large contract to work on a zooming operating system for a Samsung phone. So this was my final year of college, so pretty exciting to be like here is a concept of a phone that could get out to a whole slew of people. It was clear that — this is back in the day of flip phones and WAP browsing — so we were like, cool, instead of trying to fix the current system we’re trying to paint a picture of where these interfaces could go in the new form factor. Jef was really interested in a clear keyboard which you could type from behind, and we actually implemented.
And then, and I guess this sometimes happens, there was some political stuff that happened where the partner who was working closely with Samsung I think felt didn’t like that I and a couple of other people were much younger and still in college and leading the charge, and he wanted to lead. So he sort of threatened the money over it. I think that whole thing sort of, in many ways, took away the momentum of the thing we were working on. I remember I think I got fired, then hired again, then fired, then they tried to hire me again, and I was like ‘Nah, I don’t really want to do this’.
A group of us, Atul Varma, Jono DiCarlo, Andrew Wilson, and I decided to start a company to take one particular aspect of Archy and bring it to the world. Andrew was my college roommate, the other two, Jono and Atul, were in Jef’s class that he taught at The University of Chicago on interface design. And that’s how Humanized was formed, and in particular how Enso, which then became Ubiquity for Firefox ended up getting created.
HC: There’s an obvious connection between the two systems there. You were still iterating around the same idea, the vision that Jef had laid down. If Archy was the big vision, were you clear that you were still trying to get back to that, or would you rather see this broadly adopted, even if it’s just a small part?
AR: Yeah exactly, it was like “how do we take this to prototype and get it out?”.
For me the mantra that I was operating under then is that the best way to honour someone’s memory is to channel them into your own passion to make the world a better place.
The core concept that I was always super taken with with Archy is this idea that we need to switch to this model where you say what you want and the computer does it. So you can select some text and say ‘email this to Jono’, and it will know from my own past history whether to send in IM or text or an email. You can select something anywhere and say ‘map this’, and it will go off to the Web and find a map and even if you’re in Microsoft Word it will inject an image back or, if it can, something smarter, something live.
We thought with that this we could see how to implement it using accessibility controls to bring it to computers as they stood back then.
HC: So there was an acquisition by Mozilla between Enso and Ubiquity. So Ubiquity was Firefox focussed?
AR: Yeah, exactly. There was actually a hidden thing in there that Mark Shuttleworth of Canonical tried to buy our team first to have us go run design for Ubuntu. There was a good half-year where we were working on this deal where they wanted us to go take some Jef’s ideas, our ideas, some of the concepts of Archy and make it part of, at that point, the third largest desktop OS in the world.
HC: I’m glad you mentioned that as I had this vague recollection that I had heard that rumour somewhere but I wasn’t able to source it.
AR: Yeah it was never published. We flew over to Spain, we met all the team, we spent a long time thinking about it, but it was never public, I guess. Now it can be. Well we never said we couldn’t, we just never talked about it.
HC: It seems like there were fans, or acknowledgment of Archy through the continuing work of Humanized. You mentioned Brett Victor, Mark Shuttleworth, are you aware of anyone else who was inspired by or affiliated with the program?
AR: Yeah, let’s see. Of course with Ubiquity we grew that up to two million users, which was pretty exciting. That got the largest adoption. You can see in Quicksilver and in Alfred many similar ideas. Nicholas Jitkoff and I certainly had a lot of conversations, so there was certainly inspiration that went both ways there. I never knew the team at Alfred but there was some sort of back and forth. AI Writer from Information Architects takes a lot of inspiration from Jef’s work and Archy, and they even thought about building LEAP keys at some point.
And in many ways, and I don’t know the causal link, but the voice commands of today, Siri and Cortana, all these things, to me trace back to Ubiquity and then Archy, as a kind of to speak to get it to do what you want it to do.
Even the zooming user interface with the iPhone, where you click on the folders and you sort of zoom in and zoom out. That’s like an actual ZUI used by hundreds of millions of people, as limited as it is.
HC: Even things like persistent documents, no-save and auto-save, kind of restore back to the current state of a simpler thing that’s now pretty pervasive. I don’t really recall it being that common.
AR: It was not. I remember going round the early Web and just talking about have ridiculous it was that we even still had the save icon which was the floppy disk. I was like, dude no one knows what this is. A lot of that kind of work, the Web 2.0 kind of work of, like don’t use a warning when you use undo, and having the undo features in Gmail, I think that certainly was in Jef’s work first. That one I don’t know how direct the inspiration was, but I think the argument could be made that Jef’s work was pretty influential there.
HC: A lot has happened since then. I’m thinking the big change to personal computing which came in around 2007, the introduction of the iPhone. Obviously Archy and in some sense the other interfaces were very keyboard driven, you seem to also be suggesting that it could have also been voice-driven? The ZUI might carry over to a multi-touch world to some extent?
AR: Yeah I feel like the ZUI world could fit very well on a phone. It could be a really natural way of seeing all of your photos and all of your documents, and having a sort of spatial map. I think the fundamental problem we always ran into with ZUIs when we start to actually implement, because the amount of information you have to put in to get some place, versus just tapping on something, makes it a little harder. That was a problem we never solved. You’d have to navigate, zoom-in and zoom-out, as opposed to doing two or three taps to get some place. There was a tension in the design which we had never fully resolved.
HC: Is your gut feeling that that was something that was just inherent, or was it something that required engineering?
AR: Yeah I think it would require some real building and testing. The other paradigms have had thirty years now of A/B testing at an industry scale for what makes for a good GUI, and here we were starting from the very beginning. What makes a good ZUI? The rest of the world has a twenty to thirty year head start. So i think it would take some iteration to get it right.
HC: When you look back on the work you were doing with Archy and then through to Ubiquity, was there anything that you would have done differently, maybe if you’d had the full benefit of knowing what was to come?
AR: Oh man, so many things. Hindsight is a terrible mistress in that way. I think if I was going back I still think the direction that we’ve gone with Siri and Alexa, we could have pushed on that further. I’d have done it all open source, and I’d have done it cross platform, and focussed on in on that idea. We could have pushed it further, faster. We took the quasi-modes very seriously, and some said that may have hampered adoption.
HC: Do you think that was too much of a habit shift for people?
AR: Yeah, and it was also really awkward on current keyboards. We always had to come up with these clever hacks that you could type at the same time and you had to hit shift twice, or you’d have to remap your caps lock key, it was just a little bit inelegant. So that I think would certainly be one change.
I think there was a big opportunity early, early Web to do a text editor. We were really inspired by MoonEdit, which is a collaborative text editor who had its own client pre-Web. I think if we started with ‘Okay, what does the Web enable?’ Let’s jump in with new things that the Web can do and use that as the thin end of the wedge to get sort of the Archy-style text editor going, instead of doing it in Python as a desktop app. I think that would have been a really interesting way to have brought the ideas out. You can imagine what would have happened if Google Docs had been Archy. That would have been fascinating.s
For a long time I've held this conviction that hierarchical file systems are a disaster. Over the years, there have been many attempts at building systems which avoided them: the entire hypermedia lineage from Vannevar Bush's Memex with trails, through Ted Nelson’s work, and on to efforts like HyperCard, along with completely different approaches like the Canon Cat (and related, Archy) and Lifestreams. The common motivation seems to be that nested folders are a bad match for human thought.
We sought to reduce the influence of hierarchical directories and conventional files (which we see as large lumps with stuck names in fixed places, with compulsory gratuitous naming — unsuited to overlap, interpenetration, rich connectivity, reasonable backtracking, and most human thinking and creative work.)
It’s hard not to disagree — I’m relatively fastidious with my organization and my home folder is still full of incomprehensible structures accumulated over the years, not dissimilar to this XKCD comic:
However, history has not been kind to these projects — I’ve tried a few myself and they’re okay for a while, but I keep coming back to traditional systems, despite their flaws. Desktop operating systems are still based around folders, more Web apps and mobile apps use a “simulated” files and folders representation, to the point where iOS 11 now adds a "Files" app avoided for the previous ten releases.
In a way, this shouldn't be surprising. The Lindy Effect tells us that the longer something non-perishable, like an idea, has been around, the longer its expected lifespan. There’s clearly something powerful behind files and folders, and a thorough analysis of why might give us a framework by which to understand improvements and alternatives. I was stuck without a clear path forwards until I discovered the fantastic book The Science of Managing Our Digital Stuff by Ofer Bergman and Steve Whittaker. The book is a summary of the work done in the space of Personal Information Management (PIM) over the past few years, including many studies designed by themselves. From the blurb:
Bergman and Whittaker report that many of us use hierarchical folders for our personal digital organizing. Critics of this method point out that information is hidden from sight in folders that are often within other folders so that we have to remember the exact location of information to access it. Because of this, information scientists suggest other methods: search, more flexible than navigating folders; tags, which allow multiple categorizations; and group information management. Yet Bergman and Whittaker have found in their pioneering PIM research that these other methods that work best for public information management don’t work as well for personal information management.
From the book:
This book provides a scientific understanding of how we select, organize, and access such personal collections. Personal information management (PIM) is the process by which individuals curate their personal data in order to reaccess that data later. Curation involves three distinct processes: how we make decisions about what personal information to keep, how we organize that kept data, and the strategies by which we access it later.
A poorly designed PIM system can result in "lost personal data”, “large, disorganized personal collections of unclear value”, and “failing to deal with time-sensitive information that requires action”. So far, so good. However, there’s a fundamental tension at the heart of things:
Choosing appropriate folder organization and labels therefore requires people to predict exactly how they will be thinking about particular information at the time that they need to retrieve it. Predicting future retrieval context is difficult, because there are usually multiple ways that a file can be categorized, such as by author, topic, date or project. The inability to accurately predict how one will think about information in the future makes it more likely that future retrieval will fail.
After walking through a number of alternatives to folders (search, tagging, group management) and showing that in user studies they don’t perform as well as folders, they get to the crux of their argument. Navigating folders is a spatial task, rather than a linguistic one like search or tags, which uses a different part of the brain:
Throughout millions of years of evolution, humans have developed mechanisms that allow them to retrieve an item from a specific location (be it real of virtual) by navigating the path that they first followed when storing that information. These deep-rooted neurological biases lead to automatic activation of location-related routines, which have minimal reliance on linguistic processing, leaving the language system available for other tasks.
PIM systems have a purpose and should be measured and evaluated as such. This gives us a framework by which to start comparing and improving on the traditional folder structure, as they do in the latter part of the book. One suggestion, which I very much agree with, is to get rid of application specific storage locations:
Documents relating to a given project are stored in one folder hierarchy (e.g., in My Documents), emails in a separate mailbox hierarchy, and favorite websites in yet another browser-related hierarchy…Although the additional structure solution allows users to work in an integrated project environment, it requires managing yet another structure and may increase cognitive complexity. As well as the additional requirement to create a new structure, the user now has yet another retrieval location to maintain and remember.
The book also leaves open the question of how to manage the interface between public or group information management, where search, hyperlinks, and tags do quite well, and the spatial world of PIM. Perhaps systems should be designed to respect the boundary: shared work drives, like Google Drive, shouldn’t have a shared folder structure but should be tagged and linked (which are shown to work well in group settings and novel content). Individuals can then pull those resources locally, with a stable location that they can navigate to when they need to exploit them.
There are some interesting technologies not evaluated that I would have loved to have seen in the book. The Plan 9 style unioned directory structure, and whether that helps or hinders spatial navigation (I feel like it would help, as traditional directories mix orthogonal concerns, like which physical drive the data is stored on). I would also love to have seen an in-depth discussion of Zooming User Interfaces such as Pad. They have their own issues (a good overview here) but proponents did clearly understand the neurological processes behind spatial navigation:
We can find things in such a planning room [a room dedicated to project planning where the walls are covered in sticky notes, photos etc.] because we tend to remember landmarks and relative position. “The stuff about marketing is on the right wall, sort of lower down near the far corner,” someone might tell you. On another occasion, you go right to a particular document because you remember that it is just to the left of the orange piece of paper that Aviva put up.
— Jeff Raskin, The Human Interface
All of which is fascinating, but the real breakthrough realisation for me came with this section:
The main aim of information item classification in PIM is not to externalize our internal representation of these items (Hsieh et al. 2008) or to fully describe them, as implied by Civan et al. (2008), but to support easy, fast, and efficient retrieval.
Novel information management systems need to understand that they are a tool to serve a purpose and conduct usability studies to ensure that they're achieving them. The Lindy Effect is strong: files and folders have been around for a long time, and their staying power is testament to how well they achieve their task. Future PIM proposals will do well to internalise this and look to build on their strengths while addressing current weaknesses.
I often have this vague sense that the artefacts of computing that we see today are often small, fragmented facets of some deeper “truth”. One area that I keep coming back to is the similarities and overlaps between file systems, objects, and Web resources.
Files, organised into directories, appear in almost every operating system since Multics, and there appears to be something essential about them. Mobile operating systems tried to remove files from the user’s view, but they’re making a comeback. It appears as though you can’t avoid them.
Wikipedia states: “A computer file is a computer resource for recording data discretely in a computer storage device.” This seems intuitively obvious — their origin is in the fact that RAM is volatile so information is lost when a machine loses power. That information should be persisted between restarts, and discrete sections of that information should be independently addressable.
It seems as though as an abstraction they’ve been massaged a little over the years, but their essential nature hasn’t changed — or has it? To the best of my knowledge, the notion that a “file” had to map to a disk region started to evaporate with the introduction of procfs in 1984 in Unix 8. It is convenient to represent run-time information in the same way as disk backed information because you immediately inherit all the tools and utilities that work with files and directories. Files have the benefit of having a name that can be passed around as a reference, structured in a way that can be “walked”.
Plan 9 takes things further and introduced Control Files, which allow a program to write to one area of the file and read a response from another. This starts to look a lot like methods on objects, something pointed out in Stephen Kell’s fascinating paper “The Operating System, Should There Be One?”, which also points out that this ad-hoc agglomeration of behaviour has left some semantic gaps:
As the filesystem’s use has expanded, its semantics have become less clear. What do the timestamps on a process represent? What about the size of a control file? Is a directory tree always finite in depth (hence recursable-down) or in breadth (hence readdir()-iterable)? Although some diversity was present even when limited to files and devices (is a file seekable? what ioctls does the device support?), semantic diversity inevitably strains a fixed abstraction.
The result is a system in which the likelihood of a client’s idea of “file” being different from the file server’s idea is ever-greater. It becomes ill-defined whether “the usual things” one can do with files will work. Can I use cp to take a snapshot of a process tree? It is hard to tell. The selection of what files to compose with what programs (and fixing up any differences in expected and provided be- haviour) becomes a task for a very careful user. Unlike in Smalltalk, semantic diversity is not accompanied with any meta-level descriptive facility analogous to classes.
Yet another angle is taken by the Web, which names resources by URL, which, when fetched, return a representation. This could be a straightforward static read of bytes on a disk, but equally can be dynamically generated on demand, personalised for each client.
HTTP exposes a small set of verbs to operate over those resources, and HTML itself only really uses two (GET and POST), but really, the power of the Web model is that the behaviours exposed by the server are embedded in the resource itself. This enables a decentralised system, at a lack of meta-level standardisation.
There are clear parallels to the Unix file system structure in the design of URLs. Tim Berners-Lee wrote: “In many web browsers like the classic Apache, the URL space maps directly to chunks of the unix file system. This is deliberate as many good things come with the unix file system.” And the similarity to objects and message passing has been noted many times. What I hadn’t appreciated until recently is that Tim made the argument that resources subsumed not only files, but folders too:
It is crazy, if you think about it, that the whole screen is used to represent the information which happens to be on your local file system, using the metaphors of folders, while one window is used to represent the information in the rest of the world, using the metaphor of hypertext. What's the difference between hypertext and a desktop anyway? You can double click on things you find in either. Why can't I put folders into my hypertext documents? Why can't I write on the desk? Folders should be just another sort of document. My home page could be one, or it could be a hypertext document. The concepts of "folder" and "document" could be extended until they were the same, but I don't think that that would be necessarily a good idea. It's OK to have different forms of object for distinctly different uses.
(As an aside, when I’m looking at open-source code, I often find myself preferring the Github file browser. My best guess as to why, is that it renders Readme documents at the same level as the folder contents, which makes navigating around large codebases, appropriately documented, much simpler.)
While there is a large overlap between objects, files, and resources, systems tend to treat them very differently; clients for file systems are focused around creating and managing files, whereas Web clients seem generally built around the assumption that users are browsing existing information, and objects are generally left to programming languages, or at best, programming environments like Smalltalk or Self.
As with the introduction of the proc filesystem, I think there’s a huge amount of power to be gained from providing a consistent model of interaction with addressable items in the system. Software could become simpler and more powerful by virtue of not being siloed by arbitrary type, and users would have a smaller mental burden. There are a couple of bright spots on the horizon: Google’s experimental operating system, Fuschia, seemingly takes inspiration from Plan 9 with its namespaces, and Upsin, a distributed file system for personal data, has APIs to encourage dynamic services being exposed as files.
I think that there could be some powerful, universal model of recording and accessing state in a system, and these 3 approaches are all converging on this point in the design space, albeit from very different angles. I believe that in the future it will be inevitable that these models are treated in a unified manner, and I think that computing systems that do will be conceptually much simpler than the fragmented systems of today.
Marc Andreessen of Netscape announced a set of new products that would help transform their browser into what he called an "Internet OS" that would provide the tools and programming interfaces for a new generation of Internet-based applications. The so-called "Internet OS" would still run on top of Windows — being based around Netscape Navigator — but he dismissed desktop operating systems like Windows as simply "bag[s] of drivers", reiterating that the goal would be to "turn Windows into a mundane collection of not entirely debugged device drivers".
— Internet OS, Wikipedia
The Internet OS is the natural result of asking a simple question — why are there 2 platforms for applications in current-day systems; native and Web? For proponents of the Internet OS, the answer is simple; get rid of native apps, and standardize around the Web. The most prominent example of this philosophy today is Chrome OS, Google’s stripped back version of Linux, designed to run the Chrome browser exclusively.
It’s worth noting that the term “OS” is a little overloaded these days, but clearly, in this sense, it is not being used to cover the lowest level hardware control (kernel, device drivers, etc.) but the user-space platform; the APIs and services that user-applications are programmed against.
I am very much in favour of such an effort. A divided world of behaviour seems wasteful, and I think a cross-device, hyperlinked environment is the stronger starting point than a single machine, hardware-oriented abstraction layer. But despite 20 plus years of focus and investment in the space, it still hasn’t taken off. Even though there’s a constant stream of now, high-quality web apps, I can’t help but feel that something is missing. I frequently have to step out of the browser environment to get things done. And it doesn’t seem as though I’m alone in that:
So far, a big fan of ChromeOS and Crouton. Can easily chroot over to Ubuntu and get stuff done then flip back.
It’s important to ask why this is. A huge amount of effort is currently being spent to close the the apparent gap between native apps and Web apps by focusing on features like speed and minor integration points like notifications. These are important, but I don’t think that they’re at the root of the issue. Similarly, if we just start copying features blindly from existing OSs, we may end up replicating a world that is sub-optimal or a dead-end, and hamper further development.
As mentioned in the previous post Killing Apps, I believe that a huge amount of friction in software comes at the edges of applications, where inadequate interoperability solutions means that sophisticated day-to-day work activities are slow and painful. But at least on the desktop, there are default interoperability solutions — predominantly around the file system in UNIX based OSs (exclusively so in Plan 9 and Inferno) or objects in Smalltalk. While Web applications can expose APIs, there is nothing native to the browser/server relationship that allows me to take advantage of them; if I wish to open a presentation made in Slides in another program, one (or both) of them must have been programmed to support the specific API of the other.
The problem here really feels like a prevalence of early binding, where the data that Web applications hold can’t be reused in any situation, or in any way, that wasn’t anticipated by the author. The Web’s architecture is great for client/server state transfer, but lacks a primitive, universal interface for composing services. Through the file system or object methods, multiple tools can be brought to bear on a problem, and this just isn’t possible using only the traditional browser model.
Here’s an example I was facing today; how to import historical data from one Web app into another. The suggested solution? To download a file in one app and upload in the other (or use a 3rd party app which has been explicitly designed for this transfer). If you wanted to download a lot of files, tough; it was one by one. Bottoming out to the file system like this is an okay solution in some ways, but doesn’t feel native to the Web, and again, relies on application authors to explicitly enable this option. (A desktop app would, by default, store its data on the file system somewhere).
The most important role of the system is to provide a file system
The lack of a consistent, universal interface for working with data across applications has hit the browser a number of times. In its original incarnation, the Web was supposed to be a read write medium.
Tim Berners-Lee’s Enquire, which predates his work on HTML, had Link, Add, and Edit items in the menu. Also, for many years, the W3C used to develop the Amaya browser, which allowed for the creation of any resource, including SVG.
I’m a big believer in the intent behind these efforts, so it’s worth spending some time to think about why it failed. HTML was clearly intended to be a data storage format — resources would be saved to disk as HTML and served as is. When document editing features were dropped from browsers (leaving only form filling), there was an effort in WebDAV to rectify the situation with an additional set of interfaces. But even the WebDAV model assumes a relatively static site being served from a file system, with something like index.html in each folder.
4.5. Source Resources and Output Resources
Some HTTP resources are dynamically generated by the server. For these resources, there presumably exists source code somewhere governing how that resource is generated. The relationship of source files to output HTTP resources may be one to one, one to many, many to one, or many to many. There is no mechanism in HTTP to determine whether a resource is even dynamic, let alone where its source files exist or how to author them. Although this problem would usefully be solved, interoperable WebDAV implementations have been widely deployed without actually solving this problem, by dealing only with static resources. Thus, the source vs. output problem is not solved in this specification and has been deferred to a separate document.
— RFC 4918
Some companies, such as Google, have one solution — Google Drive. This is a cloud-native file system, and their applications, and those of 3rd parties, may use it as an interoperability backplane. It clearly addresses some issues here, but it isn’t standardised and deeply embedded in the browser (File > Save). Most importantly, though, it feels like an abdication of imagination; a cloud file system is better than nothing, but I’m not convinced that it’s the ultimate solution. Just getting the browser back to where desktops were 40 years ago does not feel like progress.
In casual conversation with friends, there seems to be a pervasive thought that the Web is deficient, but I don’t see a huge amount of effort to fix it, barring cosmetic work around the edges.
Of those who are working on solutions, there seem to be a spectrum of approaches between two extremes — one would be to tear down the Web and rebuild a more coherent system that natively allowed late binding of behaviour over different sources of data, and the other would be to embrace the Web, warts and all, and look for incremental advances, or interfaces that could be standardized, that allow us to move forward while preserving what’s worked.
Alan Kay, on the mailing list for the VPRI project, is evidently in the first camp:
For example, one of the many current day standards that was dismissed immediately is the WWW (one could hardly imagine more of a mess).
But the functionality plus more can be replaced in our "ideal world" with encapsulated confined migratory VMs ("Internet objects") as a kind of next version of Gerry Popek's LOCUS.
The browser and other storage confusions are all replaced by the simple idea of separating out the safe objects from the various modes one uses to send and receive them. This covers files, email, web browsing, search engines, etc. What is left in this model is just a UI that can integrate the visual etc., outputs from the various encapsulated VMs, and send them events to react to. (The original browser folks missed that a scalable browser is more like a kernel OS than an App)
For the other path, the most fascinating track that I’m aware of is Webstrates. It fundamentally uses the core concepts of the Web, but with just a few small tweaks, it not only solves interoperability but also tackles the app/document divide that a simple file based approach reintroduces.
All of which might lead to optimism about the future of the Web, but there are some fundamental blockers in place. The recent trends in browsers seem to be ever larger and more complex codebases supporting a mish-mash of services and APIs. This complexity means that only a few large organisations can really back browser development, and there are many vested reasons (link to recent DRM news from EFF) why they might want to see this trend continue. Or perhaps the industry is cyclical and, like the IE6 era, this is just a pause before a wave of creative innovation that fundamentally moves the Web forward, and finally establishes the Internet OS as a solution to competing user-space platforms.
Recently, I needed to create a presentation for work to introduce a lot of technical material to groups of people as efficiently as possible. Google Slides is the default choice — it’s where all our other presentations are saved. Easy, right? Unfortunately, Slides has some shortcomings. It doesn’t have the same “suggestion” mode that Google Docs has, so I paste long sections into Docs for collaboration. I had fairly complicated network diagrams created with d3’s force directed layout engine. Slides can’t embed HTML objects or even SVG, so I took screenshots which I uploaded. There were code samples to include, but copying and pasting from Atom (which, for various reasons, was my only option) doesn’t copy the highlighting. I converted the grammar file to the Pygments format, ran the code through the Pygments command line, piped into the clipboard, and then pasted.
I’d estimate that of the time spent on the presentation, more than 50% of it was lost to dealing with the friction between applications, and I don’t think this experience is unusual. It seems as though almost all large pieces of work I need to do cut across a number of distinct applications — either graphical or terminal — and a substantial proportion of my time is lost to hacking around deficiencies. It may be a case of missing features, features that are in one app and not another, or poor interoperability between data formats. That said, there are periods when I brace myself for this pain and it never materialises — barriers and limitations just seem to drop away and it feels as though I am in harmony with the computer. Sadly, moments such as this occur infrequently.
If I try and put a finger on what is different during the harmonious periods, I’d say it’s that the machine feels “malleable” — the software is not forcing me to jump through its hoops, it’s bending (or rather, easily being bent) to my will. This has lead me to believe that applications — loosely in the sense of the Wikipedia definition — cannot be a part of humane computing. They simply impede the easy creation of sophisticated content.
To me, the problem in the definition is the word “coordinated” — this is determined by the application author at design-time, rather than the end user at the moment of use. Many tasks require using a small set of functions across a number of applications and we work around this limitation either using “plumbing”, like communicating through the files or sockets, or on demand with drag and drop or copy and paste.
People are often stuck with mundane applications provided by others. This is problematic because they cannot look under the hood to understand how things work, or modify applications to suit their own needs.
— [Towards Making a Computer Tutor for Children of All Ages](— Towards Making a Computer Tutor for Children of All Ages)
Some might say that open-source can help, but I don’t think it’s sufficient in and of itself. Apps can be huge, monolithic bundles, with radically different internal architectures. If modules haven’t been explicitly designed to be reused, in all likelihood in a different language, it can be almost impossible to pull out anything of substance. I can take days or weeks to orient yourself around a large codebase. It’s a large up-front cost that’s rarely worth it.
The recent industry move to mobile and the Web has been a blessing and a curse. Mobile hides many low-level details such as the filesystem, and for a long time, iOS especially didn’t offer a good interoperability story, outside of a few system managed structures. The benefit is that individual apps are now a lot more lightweight, and recent additions like the share pane and iCloud drive mean there’s a backplane to bounce this data around. The Web is now a mature application delivery platform, but each document comes bundled with the entire application around it. In the case of my Google Slides situation, I couldn’t use a different app which had better diagram layout or code highlighting abilities at my base presentation. The upside is that a whole load of Web apps are exposing APIs, which can allow pretty sophisticated interoperability situations (RIP Yahoo Pipes), but it requires the API author to expose the full set of functionality you need (sadly, the Slides API doesn’t offer anything that could have helped with my needs above — it was the first thing I checked).
So what’s the solution? As with most things, Smalltalk was on top of this in the 70s:
Smalltalk has unlimited numbers of "Projects". Each one is a persistent environment that serves both as a place to make things and as a "page" of "desktop media".
There are no apps, only objects and any and all objects can be brought to any project which will preserve them over time. This avoids the stovepiping of apps. Dan Ingalls (in Fabrik) showed one UI and scheme to integrate the objects, and George Bosworth's PARTS system showed a similar but slightly different way. Also there is no "presentation app" in Etoys, just an object that allows projects to be put in any order — and there can many many such orderings all preserved — and there is an object that will move from one project to the next as you give your talk. "Builds", etc., are all done via Etoy scripts. This allows the full power of the system to be used for everything, including presentations. You can imagine how appalled we were by the appearance of Persuasion and PowerPoint, etc..
Smalltalk is, in many ways, the clearest expression of thinking beyond apps. You work with a set of rich, flexible objects which carry a huge amount of composable behaviour with them. In the case of something like my presentation, it would be completely natural in a Smalltalk environment to embed a computed graph or highlighted code, and they’d likely be live-editable and dynamic too.
However, I think some other environments also made fascinating stabs in this direction. One of the boldest efforts that I’m aware of is the Canon Cat — which brought data (predominantly text) front and centre, and task-based commands to operate on selections of that data. Another direction is a lineage through Oberon, through Rob Pike’s Help environment for Plan 9, and into its final form as the Acme editor. Emacs, the “great operating system, lacking only a decent editor”, has a promising approach in its composable modes. More recently, conversational UIs (either message or voice based) are unbundling apps in favour of lightweight, on-demand interactions. I plan to dive into all of those in more depth in future posts.
I believe that it’s possible to break out of the situation that we find ourselves in today, and that computing systems of the future will adapt more fluidly; systems which will enable users, and collaborative teams of users, to create digital material that’s more sophisticated with significantly less effort. However, given the state of the industry I’m not confident it will happen without substantial effort. If an effort is to be made, I think a good start is to take a look at interesting historical systems and imagine what they might look like today; to try and work our way out of a very narrow path that we really stumbled upon by mistake:
We thought we'd done away with both "operating systems" and with "apps" but we'd used the wrong wood in our stakes — the vampires came back in the 80s. One of the interesting misunderstandings was that Apple and then Microsoft didn't really understand the universal viewing mechanism (MVC) so they thought views with borders around them were "windows" and views without borders were part of "desktop publishing", but in fact all were the same. The Xerox Star confounded the problem by reverting to a single desktop and apps and missed the real media possibilities. They divided a unified media world into two regimes, neither of which are very good for end-users.