When I originally described the flexible Qubes Odyssey framework several months ago, I mentioned that we would even consider to use “Windows Native Isolation” mechanisms as a primitive type of isolation provider (“hypervisor”) for some basic edition of Qubes for Windows. The idea has been very attractive indeed, because with minimal effort we could allow people to install and run such Qubes WNI on their normal, consumer Windows laptops.
Sure, the inter-process isolation provided by a monolithic kernel such as Windows or Linux could never be compared to the inter-VM isolation offered even by the most lousy hypervisors. This is simply because the sizes of the interfaces exposed to untrusted entities (processes in case of a monolithic kernel; VMs in case of a hypervisor) are just incomparable. Just think about all those Windows system calls and GDI calls which any process can call and which contains probably thousands of bugs still waiting to be discovered by some kid with IDA. And think about those tens of thousands of drivers, which also expose (often unsecured) IOCTLs, as well as parsing the incoming packets, USB devices infos, filesystem metadata, etc. And then think about various additional services exposed by system processes, which are not part of the kernel, but which are still trusted and privileged. And now think about the typical interface that needs to be exposed to a typical VM: it's “just” the virtualized CPU, some emulated devices (some old-fashined Pentium-era chipset, SVGA graphics adapter, etc) and virtualized memory.
Anyway, knowing all this, I still believed that Qubes WNI would make a whole lot of sense. This is because Qubes WNI would still offer a significant boost over the “Just Windows” default security, which is (still) essentially equivalent to the MS-DOS security model. And this is a real pity, because Windows OS has long implemented very sophisticated security mechanisms, such as complex ACLs applicable to nearly any object, as well as recent mechanisms such as UIPI/UAC, etc. So, why not use all those sophisticated security to bring some real-world security to Windows desktops!
And, best of all, once people start using Qubes WNI, and they liked it, they could then pretty seamlessly upgrade to Xen-based Qubes OS, or perhaps Hyper-V-based Qubes OS (when we implement it) and their system would look and behave very similarly. Albeit with orders of magnitude stronger security. Finally, if we could get our Odyssey Framework to be flexible enough to support both Qubes WNI, as well as Xen-based Qubes OS, we should then be able to support any hypervisor or other isolation mechanism in the future.
And so we decided to build the Qubes WNI. Lots of work we invested in building Qubes WNI was actually WNI-independent, because it e.g. covered adjusting the core Odyssey framework to be more flexible (after all “WNI” is quite a non-standard hypervisor) as well as some components that were Windows-specific, but not WNI-specific (e.g. could very well be used on Hyper-V based Qubes OS in the future). But we also invested lots of time into evaluating all those Windows security mechanisms in order to achieve our specific goals (e.g. proper GUI isolation, networking isolation, kernel object spaces isolation, etc)...
Sadly this all has turned out to be a story without a happy end, as we have finally came to the conclusion that consumer Windows OS, with all those one-would-think sophisticated security mechanisms, is just not usable for any real-world domain isolation.
And today we publish a technical paper about our findings on Windows security model and mechanisms and why we concluded they are inadequate in practice. The paper has been written by Rafał Wojdyła who joined ITL a few months ago with the main task of implementing Qubes WNI. I think most people will be able to learn a thing or two about Windows security model by reading this paper.
Also, we still do have this little hope that somebody will read the paper and then write to us: “Oh, you're guys so dumb, you could just use this and that mechanism, to solve all your problems with WNI!” :)
The paper can be downloaded from here.
Thursday, January 16, 2014
Wednesday, December 11, 2013
Today we're releasing Qubes R2 Beta 3, one of the latest milestones on our roadmap for Qubes R2. Even though it is still called a “beta”, most users should install it, because, we believe, it is the most polished and stable Qubes edition. Looking back, I think it was a mistake to use this alpha/beta/rc nomenclature to mark Qubes releases, and so, starting with Qubes R3 we will be just using version numbers: 3.0, 3.1, etc.
Anyway, back to the R2 Beta 3 – below I discuss some of the highlights of the today's release:
- The seamless GUI virtualization for Windows 7-based AppVMs, and support for HVM-based templates (e.g. Windows-based templates) is one of the most spectacular feature of this release, I think. It has already been discussed in an earlier blog post, and now instructions have also been added to the wiki for how to install and use such Windows AppVMs.
- We've also introduced a much more advanced infrastructure for system backups, so it is now possible to make and restore backups to/from untrusted VMs, which allows e.g. to backup easily the whole system to a NAS, or just to an USB device, not worrying that somebody might exploit the NAS client over the network, or that plugging of the USB disk with malformed partition table or filesystem might compromise the system. The whole point here is that the VM that handles the backup storage (and which might be directing it to a NAS, or somewhere) might be compromised, and it still cannot do anything that could compromise (or even DoS) the system, neither can it sniff the data in the backup. I will write more about the challenges we had to solve and how we did it in a separate blog post. I'm very proud to note that majority of the implementation for this has been contributed by the community, specifically Oliver Medoc. Thanks!
- A very simple feature, trivial almost, yet very important from the security point of view – it is now possible to set 'autostart' property on select VMs. Why is this so important for security? Because I can create e.g. UsbVM, assign all my USB controllers to it, and then once I set it as autostarting, I can have assurance that all my USB controllers will be delegated to such AppVM immediately upon each system boot. Having such a UsbVM is a very good idea, if one is afraid of physical attacks coming though USB devices. And it now could double as a BackupVM with this new backup system mentioned above!
- To improve hardware compatibility we now ship the installer with multiple kernel versions (3.7, 3.9, and 3.11) allowing to run the installation using any of those, e.g. if it turned out that one kernel doesn't support the graphics card correctly -- a typical problem many users faced in the past. All the kernels are also installed in the final system, allowing the user to easily boot with a select Dom0 kernel later, choosing the one which supports their hardware best.
- Another popular problem of the past now was the lack of support for dynamically changing resolution/screen layout in the AppVMs when a seccond monitor or a projector was hot-plugged in (which changed only the resolution layout in Dom0). Now this problem has been solved and the new monitor layout is dynamically propagated to the AppVMs, allowing to use all the screen real estate by the apps running there.
- There has also been a significant amount of cleanups and fixes. This includes the unification of paths and command names (“The Underscore Revolution” as we call it), as well as refactoring of all the source code components (which now closely matches what we have on Qubes Odyssey/R3), and lots of various bugfixes.
We're planning one more release (Qubes R2 RC1) before the final R2, which will bring improvements mostly in the area of more polished UI, such as allowing some of the tasks that currently require commandline to be done from the Qubes Manager. So, this would mostly be a minor cosmetic upgrade, plus bugfixes. And probably we will also upgrade the default Linux template to Fedora 20.
Installation and upgrade instructions can be found here.
Tuesday, November 26, 2013
Finally, after months of hard work, seamless mode for Windows 7 AppVMs is coming to Qubes OS! The new Windows Support Tools will be released together with the Qubes OS R2 Beta 3, which we plan to release in the next 1-2 weeks. Here is an obligatory screenshot showing a few Windows apps running in seamless mode integrated onto Qubes trusted desktop (note the usual Qubes trusted decorations around each of the Win7 windows):
The seamless mode for Windows AppVMs is not yet as polished as the one we have for Linux AppVMs, because, unlike what we do for Xorg, the Windows GUI agent is not based on composition buffers extraction. This causes some, rather minor, cosmetic problems. For example, when we have two overlapping windows from a Win7 AppVM, and move the top window away, its remaining "shadow" will be visible on the underlying window for the duration of the operation. But generally this all works reasonably good, and you should not really feel any slowness or heaviness compared to Linux AppVMs virtualization. It should be noted that we managed to add this seamless support for Windows AppVMs without any changes to our secure GUI virtualization protocol.
Of course, the usual Qubes integration features, such as secure inter-VM clipboard and file copy also work for Windows AppVMs with the tools installed.
The Qubes Windows Support Tools are proprietary, but they are supposed to be installed only in the Windows 7 VMs, which themselves contain millions of lines of proprietary code already. Besides that, the tools do not introduce any other modifications to the system.
As a special bonus we have also added (and releasing also in R2B3) the support for template-based HVMs. So it will now be possible to do something like this:
qvm-create --hvm work-win7 --template win7-x64 --label greenqvm-create --hvm personal-win7 --template win7-x64 --label purple
qvm-create --hvm testing-win7 --template win7-x64 --label red
All such template-based AppVMs use the root filesystem from the Template VM, which is shared in a read-only manner, of course, but Qubes makes it look for the AppVMs as if the root filesystem was writable. Just like in case of Linux AppVMs, the actual writes are stored in COW buffers backed by files stored in each of the AppVMs directories. Upon AppVM's reboot, those files are discarded, which reverts the VMs' root filesystems back to that of the template (the “golden image”).
For the above mechanism to make any sense we should configure the OS in the Template VM to use a separate disk for the user's home directory(ies) (e.g. C:\Users in case of Windows). Qubes automatically exposes an additional private disk to each of the AppVMs exactly for this very purpose. Again, just like it has been done for Linux AppVMs for years.
The above feature allows to create lots of Windows AppVMs quickly and with minimal use of disk space, and with an ability to centrally update all the system software in all the AppVMs all at once. Just like for Linux AppVMs.
Users should, however, ensure that their license allows for such instantiating of the OS they use in the template. Note that from the technical point of view the OS is installed, and, in case of Windows, also activated, only once: in the template VM. The installed files are never copied, they are only shared with the running instances of AppVMs. Consult your software licensing lawyer.
Monday, September 23, 2013
In the first part of this article published a few weeks ago, I have discussed the basics of Intel SGX technology, and also discussed challenges with using SGX for securing desktop systems, specifically focusing on the problem of trusted input and output. In this part we will look at some other aspects of Intel SGX, and we will start with a discussion of how it could be used to create a truly irreversible software.
SGX Blackboxing – Apps and malware that cannot be reverse engineered?
A nice feature of Intel SGX is that the processor automatically encrypts the content of SGX-protected memory pages whenever it leaves the processor caches and is stored in DRAM. In other words the code and data used by SGX enclaves never leave the processor in plaintext.
This feature, no doubt influenced by the DRM industry, might profoundly change our approach as to who controls our computers really. This is because it will now be easy to create an application, or malware for that matter, that just cannot be reversed engineered in any way. No more IDA, no more debuggers, not even kernel debuggers, could reveal the actual intentions of the EXE file we're about to run.
Consider the following scenario, where a user downloads an executable, say blackpill.exe, which in fact logically consists of three parts:
- A 1st stage loader (SGX loader) which is unencrypted, and which task is to setup an SGX enclave, copy the rest of the code there, specifically the 2nd stage loader, and then start executing the 2nd stage loader...
- The 2nd stage loader, which starts executing within the enclave, performs remote attestation with an external server and, in case the remote attestation completes successfully, obtains a secret key from the remote server. This code is also delivered in plaintext too.
- Finally the encrypted blob which can only be decrypted using the key obtained by the 2nd stage loader from the remote server, and which contains the actual logic of the application (or malware).
We can easily see that there is no way for the user to figure out what the code from the encrypted blob is going to do on her computer. This is because the key will be released by the remote server only if the 2nd stage loader can prove via remote attestation that it indeed executes within a protect SGX enclave and that it is the original unmodified loader code that the application's author created. Should one bit of this loader be modified, or should it be attempted to run outside of an SGX enclave, or within a somehow misconfigured SGX enclave, then the remote attestation would fail and the key will not be obtained.
And once the key is obtained, it is available only within the SGX enclave. It cannot be found in DRAM or on the memory bus, even if the user had access to expensive DRAM emulators or bus sniffers. And the key cannot also be mishandled by the code that runs in the SGX enclave, because remote attestation also proved that the loader code has not been modified, and the author wrote the loader specifically not to mishandle the key in any way (e.g. not to write it out somewhere to unprotected memory, or store on the disk). Now, the loader uses the key to decrypt the payload, and this decrypted payload remains within secure enclave, never leaving it, just like the key. It's data never leaves the enclave either...
One little catch is how the key is actually sent to the SGX-protected enclave so that it could not be spoofed in the middle? Of course it must be encrypted, but to which key? Well, we can have our 2nd stage loader generate a new key pair and send the public key to the remote server – the server will then use this public key to send the actual decryption key encrypted with this loader's public key. This is almost good, except for the fact that this scheme is not immune to a classic main in the middle attack. The solution to this is easy, though – if I understand correctly the description of the new Quoting and Sealing operations performed by the Quoting Enclave – we can include the generated public key hash as part of the data that will be signed and put into the Quote message, so the remote sever can be assured also that the public key originates from the actual code running in the SGX enclave and not from Mallory somewhere in the middle.
So, what does the application really do? Does it do exactly what has been advertised by its author? Or does it also “accidentally” sniffs some system memory or even reads out disk sectors and sends the gathered data to a remote server, encrypted, of course? We cannot know this. And that's quite worrying, I think.
One might say that we do accept all the proprietary software blindly anyway – after all who fires up IDA to review MS Office before use? Or MS Windows? Or any other application? Probably very few people indeed. But the point is: this could be done, and actually some brave souls do that. This could be done even if the author used some advanced form of obfuscation. Can be done, even if taking lots of time. Now, with Intel SGX it suddenly cannot be done anymore. That's quite a revolution, complete change of the rules. We're no longer masters of our little universe – the computer system – and now somebody else is.
Unless there was a way for “Certified Antivirus companies” to get around SGX protection.... (see below for more discussion on this).
...And some good applications of SGX
The SGX blackboxing has, however, some good usages too, beyond protecting the Hollywood productions, and making malware un-analyzable...
One particularly attractive possibility is the “trusted cloud” where VMs offered to users could not be eavesdropped or tampered by the cloud provider admins. I wrote about such possibility two years ago, but with Intel SGX this could be done much, much better. This will, of course, require a specially written hypervisor which would be setting up SGX containers for each of the VM, and then the VM could authenticate to the user and prove, via remote attestation, that it is executing inside a protected and properly set SGX enclave. Note how this time we do not require the hypervisor to authenticate to the users – we just don't care, if our code correctly attests that it is in a correct SGX, it's all fine.
Suddenly Google could no longer collect and process your calendar, email, documents, and medial records! Or how about a tor node that could prove to users that it is not backdoored by its own admin and does not keep a log of how connections were routed? Or a safe bitcoin web-based wallet? It's hard to overestimate how good such a technology might be for bringing privacy to the wide society of users...
Assuming, of course, there was no backdoor for the NSA to get around the SGX protection and ruin this all goodness...(see below for more discussion on this).
New OS and VMM architectures
In the paragraph above I mentioned that we will need specially written hypervisors (VMMs) that will be making use of SGX in order to protect the user's VMs against themselves (i.e. against the hypervisor). We could go further and put other components of a VMM into protected SGX enclaves, things that we currently, in Qubes OS, keep in separate Service VMs, such as networking stacks, USB stacks, etc. Remember that Intel SGX provides convenient mechanism to build inter-enclave secure communication channels.
We could also take the “GUI domain” (currently this is just Dom0 in Qubes OS) and move it into a separate SGX enclave. If only Intel came up with solid protected input and output technologies that would work well with SGX, then this would suddenly make whole lots of sense (unlike currently where it is very challenging). What we win this way is that no longer a bug in the hypervisor should be critical, as it would be now a long way for the attacker who compromised the hypervisor to steal any real secret of the user, because there are no secrets in the hypervisor itself.
In this setup the two most critical enclaves are: 1) the GUI enclave, of course, and 2) the admin enclave, although it is thinkable that the latter could be made reasonably deprivileged in that it might only be allowed to create/remove VMs, setup networking and other policies for them, but no longer be able to read and write memory of the VMs (Anti Snowden Protection, ASP?).
And... why use hypervisors? Why not use the same approach to compartmentalize ordinary operating systems? Well, this could be done, of course, but it would require considerable rewrite of the systems, essentially turning them into microkernels (except for the fact that the microkernel would no longer need to be trusted), as well as the applications and drivers, and we know that this will never happen. Again, let me repeat one more time: the whole point of using virtualization for security is that it wraps up all the huge APIs of an ordinary OS, like Win32 or POSIX, or OSX, into a virtual machine that itself requires orders of magnitude simpler interface to/from the outside world (especially true for paravirtualized VMs), and all this without the need to rewrite the applications.
Trusting Intel – Next Generation of Backdooring?
We have seen that SGX offers a number of attractive functionality that could potentially make our digital systems more secure and 3rd party servers more trusted. But does it really?
The obvious question, especially in the light of recent revelations about NSA backdooring everything and the kitchen sink, is whether Intel will have backdoors allowing “privileged entities” to bypass SGX protections?
Traditional CPU backdooring
Of course they could, no question about it. But one can say that Intel (as well as AMD) might have been having backdoors in their processors for a long time, not necessarily in anything related to SGX, TPM, TXT, AMT, etc. Intel could have built backdoors into simple MOV or ADD instructions, in such a way that they would automatically disable ring/page protections whenever executed with some magic arguments. I wrote more about this many years ago.
The problem with those “traditional” backdoors is that Intel (or a certain agency) could be caught using it, and this might have catastrophic consequences for Intel. Just imagine somebody discovered (during a forensic analysis of an incident) that doing:
MOV eax, $deadbeef
MOV ebx, $babecafe
ADD eax, ebx
...causes ring elevation for the next 1000 cycles. All the processors affected would suddenly became equivalents of the old 8086 and would have to be replaced. Quite a marketing nightmare I think, no?
Next-generation CPU backdooring
But as more and more crypto and security mechanisms got delegated from software to the processor, the more likely it becomes for Intel (or AMD) to insert really “plausibly deniable” backdoors into processors.
Consider e.g. the recent paper on how to plant a backdoor into the Intel's Ivy Bridge's random number generator (usable via the new RDRAND instruction). The backdoor reduces the actual entropy of the generator making it feasible to later brute-force any crypto which uses keys generated via the weakened generator. The paper goes into great lengths describing how this backdoor could be injected by a malicious foundry (e.g. one in China), behind the Intel's back, which is achieved by implementing the backdoor entirely below the HDL level. The paper takes a “classic” view on the threat model with Good Americans (Intel engineers) and the Bad Chinese (foundry operators/employees). Nevertheless, it should be obvious that Intel could have planted such a backdoor without any effort or challenge described in the paper, because they could do so at any level, not necessarily below HDL.
But backdooring an RNG is still something that leaves traces. Even though the backdoored processor can apparently pass all external “randomness” testes, such as the NIST testsuite, they still might be caught. Perhaps because somebody will buy 1000 processors and will run them for a year and will note down all the numbers generated and then conclude that the distribution is quite not right. Or something like that. Or perhaps because somebody will reverse-engineer the processor and specifically the RNG circuitry and notice some gates are shorted to GND. Or perhaps because somebody at this “Bad Chinese” foundry will notice that.
Let's now get back to Intel SGX -- what is the actual Root of Trust for this technology? Of course, the processor, just like for the old ring3/ring0 separation. But for SGX there is additional Root of Trust which is used for remote attestation, and this is the private key(s) used for signing the Quote Messages.
If the signing private key somehow got into the hands of an adversary, the remote attestation breaks down completely. Suddenly the “SGX Blackboxed” apps and malware can readily be decrypted, disassembled and reverse engineered, because the adversary can now emulate their execution step by step under a debugger and still pass the remote attestation. We might say this is good, as we don't want irreversible malware and apps. But then, suddenly, we also loose our attractive “trusted cloud” too – now there is nothing that could stop the adversary, who has the private signing key, to run our trusted VM outside of SGX, yet still reporting to us that it is SGX-protected. And so, while we believe that our trusted VM should be trusted and unsniffable, and while we devote all our deepest secrets to it, the adversary can read them all like on a plate.
And the worst thing is – even if somebody took such a processor, disassembled it into pieces, analyzed transitor-by-transitor, recreated HDL, analyzed it all, then still it all would look good. Because the backdoor is... the leaked private key that is now also in the hands of the adversary, and there is no way to prove it by looking at the processor alone.
As I understand, the whole idea of having a separate TPM chip, was exactly to make such backdoor-by-leaking-keys more difficult, because, while we're all forced to use Intel or AMD processors today, it is possible that e.g. every country can produce their own TPM, as it's million times less complex than a modern processor. So, perhaps Russia could use their own TPMs, which they might be reasonably sure they use private keys which have not be handed over to the NSA.
However, as I mentioned in the first part of this article, sadly, this scheme doesn't work that well. The processor can still cheat the external TPM module. For example, in case of an Intel TXT and TPM – the processor can produce incorrect PCR values in response to certain trigger – in that case it no longer matters that the TPM is trusted and keys not leaked, because the TPM will sign wrong values. On the other hand we go back now to using “traditional” backdoors in the processors, whose main disadvantage is that people might got cought using them (e.g. somebody analyzed an exploit which turns out to be triggering correct Quote message despite incorrect PCRs).
So, perhaps, the idea of separate TPM actually does make some sense after all?
What about just accidental bugs in Intel products?
Conspiracy theories aside, what about accidental bugs? What are the chances of SGX being really foolproof, at least against those unlucky adversaries who didn't get access to the private signing keys? The Intel's processor have become quite a complex beasts these days. And if you also thrown in the Memory Controller Hub, it's unimaginably complex beast.
Let's take a quick tour back discussing some spectacular attacks against Intel “hardware” security mechanisms. I wrote “hardware” in quotation marks, because really most of these technologies is software, like most of the things in electronics these days. Nevertheless the “hardware enforced security” does have a special appeal to lots of people, often creating an impression that these must be some ultimate unbreakable technologies....
I think it all started with our exploit against Intel Q35 chipset (slides 15+) demonstrated back in 2008 which was the first attack allowing to compromise, otherwise hardware-protected, SMM memory on Intel platforms (some other attacks against SMM shown before assumed the SMM was not protected, which was the case on many older platforms).
This was then shortly followed by another paper from us about attacking Intel Trusted Execution Technology (TXT), which found out and exploited a fact that TXT-loaded code was not protected against code running in the SMM mode. We used our previous attack on Q35 against SMM, as well as found a couple of new ones, in order to compromise SMM, plant a backdoor there, and then compromise TXT-loaded code from there. The issue highlighted in the paper has never really been correctly patched. Intel has spent years developing something they called STM, which was supposed to be a thin hypervisor for SMM code sandboxing. I don't know if the Intel STM specification has eventually been made public, and how many bugs it might be introducing on systems using it, or how much inaccurate it might be.
In the following years we presented two more devastating attacks against Intel TXT (none of which depending on compromised SMM): one which exploited a subtle bug in the processor SINIT module allowing to misconfigure VT-d protections for TXT-loaded code, and another one exploiting a classic buffer overflow bug also in the processor's SINIT module, allowing this time not only to fully bypass TXT, but also fully bypass Intel Launch Control Policy and hijack SMM (several years after our original papers on attacking SMM the old bugs got patched and so this was also attractive as yet another way to compromise SMM for whatever other reason).
Invisible Things Lab has also presented first, and as far as I'm aware still the only one, attack on Intel BIOS that allowed to reflash the BIOS despite Intel's strong “hardware” protection mechanism to allow only digitally signed code to be flashed. We also found out about secret processor in the chipset used for execution of Intel AMT code and we found a way to inject our custom code into this special AMT environment and have it executed in parallel with the main system, unconstrained by any other entity.
This is quite a list of Intel significant security failures, which I think gives something to think about. At the very least that just because something is “hardware enforced” or “hardware protected” doesn't mean it is foolproof against software exploits. Because, it should be clearly said, all our exploits mentioned above were pure software attacks.
But, to be fair, we have never been able to break Intel core memory protection (ring separation, page protection) or Intel VT-x. Rafal Wojtczuk has probably came closest with his SYSRET attack in an attempt to break the ring separation, but ultimately the Intel's excuse was that the problem was on the side of the OS developers who didn't notice subtle differences in the behavior of SYSRET between AMD and Intel processors, and didn't make their kernel code defensive enough against Intel processor's odd behavior.
We have also demonstrated rather impressive attacks bypassing Intel VT-d, but, again, to be fair, we should mention that the attacks were possible only on those platforms which Intel didn't equip with so called Interrupt Remapping hardware, and that Intel knew that such hardware was indeed needed and was planning it a few years before our attacks were published.
So, is Intel SGX gonna be as insecure as Intel TXT, or as secure as Intel VT-x....?
The bottom line
Intel SGX promises some incredible functionality – to create protected execution environments (called enclaves) within untrusted (compromised) Operating System. However, for SGX to be of any use on a client OS, it is important that we also have technologies to implement trusted output and input from/to the SGX enclave. Intel currently provides little details about the former and openly admits it doesn't have the later.
Still, even without trusted input and output technologies, SGX might be very useful in bringing trust to the cloud, by allowing users to create trusted VMs inside untrusted provider infrastructure. However, at the same time, it could allow to create applications and malware that could not be reversed engineered. It's quote ironic that those two applications (trusted cloud and irreversible malware) are mutually bound together, so that if one wanted to add a backdoor to allow A/V industry to be able to analyze SGX-protected malware, then this very same backdoor could be used to weaken the guarantees of the trustworthiness of the user VMs in the cloud.
Finally, a problem that is hard to ignore today, in the post-Snowden world, is the ease of backdooring this technology by Intel itself. In fact Intel doesn't need to add anything to their processors – all they need to do is to give away the private signing keys used by SGX for remote attestation. This makes for a perfectly deniable backdoor – nobody could catch Intel on this, even if the processor was analyzed transistor-by-transistor, HDL line-by-line.
As a system architect I would love to have Intel SGX, and I would love to believe it is secure. It would allow to further decompose Qubes OS, specifically get rid of the hypervisor from the TCB, and probably even more.
Special thanks to Oded Horowitz for turning my attention towards Intel SGX.
Friday, August 30, 2013
Intel Software Guard Extensions (SGX) might very well be The Next Big Thing coming to our industry, since the introduction of Intel VT-d, VT-x, and TXT technologies in the previous decade. It apparently seem to promise what so far has never been possible – an ability to create a secure enclave within a potentially compromised OS. It sounds just too great, so I decided to take a closer look and share some early thoughts on this technology.
Intel SGX – secure enclaves within untrusted world!
Intel SGX is an upcoming technology, and there is very little public documents about it at the moment. In fact the only public papers and presentations about SGX can be found in the agenda of one security workshop that took place some two months ago. The three papers from Intel engineers presented there provide a reasonably good technical introduction to those new processor extensions.
You might think about SGX as of a next generation of Intel TXT – a technology that has never really took off, and which has had a long history of security problems disclosed by certain team of researchers ;) Intel TXT has also been perhaps the most misunderstood technology from Intel – in fact many people thought about TXT as if it already could provide security enclaves within untrusted OS – this however was not really true (even ignoring for our multiple attacks) and I have spoke and wrote many times about that in the past years.
It's not clear to me when SGX will make it to the CPUs that we could buy in local shops around the corner. I would be assuming we're talking about 3-5 years from now, because the SGX is not even described in the Intel SDM at this moment.
Intel SGX is essentially a new mode of execution on the CPU, a new memory protection semantic, plus a couple of new instructions to manage this all. So, you create an enclave by filling its protected pages with desired code, then you lock it down, measure the code there, and if everything's fine, you ask the processor to start executing the code inside the enclave. Since now on, no entity, including the kernel (ring 0) or hypervisor (ring “-1”), or SMM (ring “-2”) or AMT (ring “-3”), has no right to read nor write the memory pages belonging to the enclave. Simple as that!
Why have we had to wait so long for such technology? Ok, it's not really that simple, because we need some form of attestation or sealing to make sure that the enclave was really loaded with good code.
The cool thing about an SGX enclave is that it can coexist (and so, co-execute) together with other code, such all the untrusted OS code. There is no need to stop or pause the main OS, and boot into a new stub mini-OS, like it was with the TXT (this is what e.g. Flicker tried to do, and which was very clumsy). Additionally, there can be multiple enclaves, mutually untrusted, all executing at the same time.
No more stinkin' TPMs nor BIOSes to trust!
A nice surprise is that SGX infrastructure no longer depends on the TPM to do measurements, sealing and attestation. Instead Intel has a special enclave that essentially emulates the TPM. This is a smart move, and doesn't decrease security in my opinion. It surely makes us now trust only Intel vs. trusting Intel plus some-asian-TPM-vendor. While it might sound like a good idea to spread the trust between two or more vendors, this only really makes sense if the relation between trusting those vendors is expressed as “AND”, while in this case the relation is, unfortunately of “OR” type – if the private EK key gets leaked from the TPM manufacture, we can bypass any remote attestation, and no longer we need any failure on the Intel's side. Similarly, if Intel was to have a backdoor in their processors, this would be just enough to sabotage all our security, even if the TPM manufacture was decent and played fair.
Because of this, it's generally good that SGX allows us to shrink the number of entities we need to trust down to just one: Intel processor (which, these days include the CPUs as well as the memory controller, and, often, also a GPU). Just to remind – today, even with a sophisticated operating system architecture like those we use in Qubes OS, which is designed with decomposition and minimizing trust in mind, we still need to trust the BIOS and the TPM, in addition to the processor.
And, of course, because SGX enclaves memories are protected against any other processor mode's access, so SMM backdoor no longer can compromise our protected code (in contrast to TXT, where SMM can subvert a TXT-loaded hypervisor), nor any other entity, such as the infamous AMT, or malicious GPU, should be able to do that.
So, this is all very good. However...
Secure Input and Output (for Humans)
For any piece of code to be somehow useful, there must be a secure way to interact with it. In case of servers, this could be implemented by e.g. including the SSL endpoint inside the protected enclave. However for most applications that run on a client system, ability to interact with the user via screen and keyboard is a must. So, one of the most important questions is how does Intel SGX secures output to the screen from an SGX enclave, as well as how does it ensure that the input the enclave gets is indeed the input the user intended?
Interestingly, this subject is not very thoroughly discussed in the Intel papers mentioned above. In fact only one paper briefly mentions Intel Protected Audio Video Path (PVAP) technology that apparently could be used to provide secured output to the screen. The paper then references... a consumer FAQ onBlueRay Disc Playback using Intel HD graphics. There is no further technical details and I was also unable to find any technical document from Intel about this technology. Additionally this same paper admits that, as of now, there is no protected input technology available, even on prototype level, although they promise to work on that in the future.
This might not sound very surprising – after all one doesn't need to be a genius to figure out that the main driving force behind this whole SGX thing is the DRM, and specifically protecting Holywwod media against the pirate industry. This would be nothing wrong in itself, assuming, however, the technology could also have some other usages, that could really improve security of the user (in contrast to the security of the media companies).
We shall remember that all the secrets, keys, tokens, and smart-cards, are ultimately to allow the user to access some information. And how does people access information? By viewing in on a computer screen. I know, I know, this so retro, but until we have direct PC-brain interfaces, I'm afraid that's the only way. Without properly securing the graphics output, all the secrets can be ultimately leaked out.
Also, how people command their computers and applications? Well, again using this retro thing called keyboard and mouse (touchpad). However secure our enclave might be, without secured input, the app would not be able to distinguish intended user input from simulated input crafted by malware. Not to mention about such obvious attacks as sniffing of the user input.
Without protected input and output, SGX might be able to stop the malware from stealing the user's private keys for email encryption or issuing bank transactions, yet the malware will still be able to command this super-secured software to e.g. decrypt all the user emails and later steal the screenshots of all the plaintext messages (with a bit of simple programming, the screenshot's could be turned back into nice ASCII text for saving on bandwidth when leaking them out to a server in Hong Kong), or better yet, perhaps just forward them to an email address that the attacker controls (perhaps still encrypted, but using the attackers key).
But, let's ignore for a moment this “little issue” of lack of protected input, and lack of technical documentation on how secure graphics output is really implemented. Surely it is thinkable that protected input and output could be implemented in a number of ways, and so let's hope Intel will do it, and will do right. We should remember here, that whatever mechanism Intel is going to use to secure the graphics and audio output, it surely will be an attractive target of attacks, as there is probably a huge money incentive for such attacks in the film illegal copying business.
Securing mainstream client OSes and why this is not so simple?
As mentioned above, for SGX enclaves to be truly meaningful on client systems we need protected input and output, to and from the secured enclaves. Anyway, lets assume for now that Intel has come up with robust mechanisms to provide these. Let's now consider further, how SGX could be used to turn our current mainstream desktop systems into reasonably secure bastions.
We start with a simple scenario – a dedicated application for viewing of incoming encrypted files, say PDFs, performing their decryption and signature verification., and displaying of the final outcome to the user (via protected graphics path). The application takes care about all the key management too. All this happens, of coruse, inside an SGX enclave(s).
Now, this sounds all attractive and surely could be implemented using the SGX. But what about if we wanted our secure document viewer to become a bit more than just a viewer? What if we wanted a secure version of MS Word or Excel, with its full ability to open complex documents and edit them?
Well it's obviously not enough to just put the proverbial msword.exe into a SGX enclave. It is not, because the msword.exe makes use of million of other things that are provided by the OS and 3rd libraries, in order to perform all sorts of tasks it is supposed to do. It is not a straightforward decision to draw a line between those parts that are security sensitive and those that are not. Is font parsing security critical? Is drawing proper labels on GUI buttons and menu lists security critical? Is rendering of various objects that are part of the (decrypted) document, such as pictures, security critical? Is spellchecking security critical? Even if the function of some of a subsystem seem not security critical (i.e. not allows to easily leak the plaintext document out of the enclave), let's not forget that all this 3rd party code would be interacting very closely with the enclave-contained code. This means the attack surface exposed to all those untrusted 3rd party modules will be rather huge. And we already know it is rather not possible to write a renderer for such complex documents as PDFs, DOCs, XLS, etc, without introducing tons of exploitable bugs. And these attack are not coming now from the potentially malicious documents (against those we protect, somehow, by parsing only signed document from trusted peers), but are coming from the compromised OS.
Perhaps it would be possible to take Adobe Reader, MS Word, Powerpoint, Excel etc, and just rewrite every of those apps from scratch in a way that they were properly decomposed into sensitive parts that execute within SGC enclave(s), and those that are not-sensitive and make use of all the OS-provided functionality, and further define clean and simple interfaces between those parts, ensuring the “dirty” code cannot exploit the sensitive code. Somehow attractive, but somehow I don't see this happening anytime soon.
But, perhaps, it would be easier to do something different – just take the whole msword.exe, all the DLLs it depends on, as well as all the OS subsystems it depends on, such as the GUI subsystem, and put all of this into an enclave. This sounds like a more rational approach, and also more secure.
Only notice one thing – we just created... a Virtual Machine with Windows OS inside and the msword.exe that uses this Windows OS.. Sure, it is not a VT-x-based VM, it is an SGX-based VM now, but it is largely the same animal!
Again, we came to the conclusion why the use of VMs is suddenly perceived as such an increase in security (which some people cannot get, claiming that introducing VM-layer only increases complexity) – the use of VMs is profitable because of one of thing: it suddenly packs all the fat libraries- and OS-exposed APIs and subsystems into one security domain, reducing all the interfaces between this code in the VM and the outside world. Reducing of the interfaces between two security domains is ALWAYS desirable.
But our SGX-isolated VMs have one significant advantage over the other VM technologies we got used to in the last decade or so – namely those VMs can now be impenetrable to any other entity outside of the VM. No kernel or hypervisor can peek into its memory. Neither can the SMM, AMT, or even a determined physical attacker with DRAM emulator, because SGX automatically encrypts any data that leave the processor, so everything that is in the DRAM is encrypted and useless to the physical attacker.
This is a significant achievement. Of course SGX, strictly speaking, is not a (full) virtualization technology, it's not going to replace VT-x.. But remember we don't always need full virtualization, like VT-x, often we can use paravirtualization and all we need in that case is a good isolation technology. For examaple, Xen uses paravirtualization for Linux-based PV VMs, and uses good-old ring3/ring0 separation mechanism to implement this, and the level of isolation of such PV domains on Xen is comparable to the isolation of HVMs, which are virtualized using VT-x.
To Be Continued
In the next part of this article, we will look into some interesting unconventional uses of SGX, such as creating malware that cannot be reversed engineered, or TOR nodes or Bitcoin mixers that should be reasonably trusted, even if we don't trust their operators. Then we will discuss how SGX might profoundly change the architecture of the future operating systems, and virtualization systems, in a way that we will no longer need to trust (large portions of) their kernels or hypervisors, or system admins (Anti Snowden Protection?) And, of course, how our Qubes OS might embrace this technology in the future.
Finally, we should discuss the important issue of whether this whole SGX, while providing many great benefits for system architects, should really be blindly trusted? What are the chances of Intel building in backdoors there and exposing those to the NSA? Is there any difference in trusting Intel processors today vs. trusting the SGX as a basis of security model of all software in the future?
Friday, June 21, 2013
In a previous post I have outlined a new direction we're aiming with the Qubes project, which is a departure from using a “hardcoded” hypervisor with Qubes (as well as “hardcoded” Linux as Dom0, GUI domain, etc).
Today I'm happy to announce that we've already completed initial porting of the current Qubes OS into this Hypervisor-Abstraction-Layer-based framework. The new version of Qubes, that we call “R3 Alpha” for now, builds fine, installs fine, and even (mostly) works(!), as can be admired on the screenshot below :) It still uses Xen, of course, but this time in a non-hardcoded way, which allows to replace it easily with another hypervisor, as I discuss below.
Our Qubes Odyssey backend needed to support a specific hypervisor comprises essentially three parts:
- A libvirt driver to support a given VMM. In our case we got it (almost) for free, because Xen 4.2 is well supported by libvirt. I wrote “almost” for free, because some patches to libvirt were still needed, mostly to get rid of some unjustified simplifying assumptions, such as that all the backends are always in Dom0, which is not the case for Qubes OS, of course. Some of those patches were accepted into upstream libvirt, some (still) not, so we had to fork libvirt.
- A VMM-specific implementation of our vchan – a simple, socket-like, VMM shared memory-based communication stack between the VMs. Again, in case of Xen 4.2 we got that (almost) for free, because Xen 4.2 has now included a libxenvchan component, which is modified (improved and cleaned up) version of our original vchan (written in early Qubes days for older Xen versions) contributed and maintained by Daniel De Graff from the NSA.
- Some minor configuration files, e.g. to tell libvirt which hypervisor protocol to use (in our case: xen:///), and VM configuration template files.
Now, if one wanted to switch Xen for some other hypervisor, such as e.g. the KVM, we would need to write a KVM Odyssey backend in a form of providing the above mentioned three elements. Again, libvirt driver we would get for free, configuration files would be trivial to write, and the only task which would require some coding would be the vchan for KVM.
Ok, one thing that is left out (non-HAL'ified) for now, is the xc_map_foreign_pages() Xen-specific function call within our GUI daemon.
Ideally such call could also be handled by the libvirt API, however it's not clear to us whether true zero-copy page access is really supported (and intended). If it is not, we will try to contribute a patch to libvirt to add such functionality, as it is generally useful for many things that involve high-speed inter-VM communication, of which our GUI virtualization is just one example. So, at this moment, one would need to add an ugly #if (BACKEND_VMM == ...) to the code above and use another VMM's function(s), equivalent to the xc_map_foreign_pages() on Xen.
But besides the above, essentially everything else should Just Work (TM). And that's pretty amazing, I think :) While I personally can't immediately see any security benefit of switching from Xen to KVM, it might appeal to some people for other reasons (Performance? Better hardware support?). The point is: this should be now easy to do.
If one wanted to support some Windows-based hypervisor, on the other hand, such as MS Hyper-V, or Virtual Box on top of Windows, then two more things will need to be taken care of:
- Our core management stack (the core-admin repository), the core RPC services (mostly the qrexec daemon, currently part of core-admin-linux repo), and the libvirt code (core-libvirt, a forked original libvirt with some custom patches I mentioned above), all would need to build and run fine on Windows. While this is not a big problem for core-admin (it's all python) and core-libvirt (it is supposed to build and run on Windows fine), the qrexec daemon would need to be rewritten with Windows OS in mind. We're currently working on this step, BTW.
- The GUI daemon would also need to be ported to run on Windows, instead of on top of X Server. This is somehow orthogonal to the need to get rid of the hardcoded xc_map_foreign_pages() function as mentioned above. This step might be optional, however, if we wanted to use a Linux-based (and so Xorg-based GUI server) as a GUI domain.
Once the above two pieces are made Windows-ready (note how I wrote Windows-ready, and not specific-VMM-ready), we can then use any Windows-based hypervisor we want (i.e. for which we have libvirt driver, and can write vchan).
This is again pretty amazing, because it means we don't need N*M variations of each component (where N is the number of VMMs, and M the number of host/GUI OSes to support) – but only N+M! This is similar to how modern compilers are designed using a language-specific frontends (C, C++, Pascal, C#, etc), and architecture-specific backends (x86, x64, ARM, etc), and an Intermediate Language for internal “grinding”, again achieving this N+M number of needed variants instead of N*M, which otherwise would be just totally impractical.
One other detail I would like to point out, and which is also visible on the screenshot above, is that we also got rid of using the Xen-specific Xenstore infrastructure (a registry-like tree-based infrastructure for inter-VM configuration and status exchange), and we replaced it with our own, vchan-based Qubes DB (core-qubesdb).
One interesting thing about Qubes DB is that it get rids of the (overly complex and unnecessary) permission system that is used by xenstore, and instead uses the most simple approach: each VM has its separate Qubes DB daemon, and so a totally separate configuration/state namespace. This is inline with the rest of the Qubes philosophy, which basically says that: permissions is dead, long live separation!
So, in Qubes OS we just isolate everything by default, unless a user/configuration specifically allows an exception – e.g. no file copy operation between domains is possible, unless the user expresses an explicit consent for it.
Many old-school security people can't imagine a system without permissions, but if we think about it more, we might get to a conclusion that: 1) permissions are complex and so often difficult to understand and set correctly, 2) require often complex code to parse and make security decisions, and 3) often are absolutely unneeded.
As a practical example of how permissions schemes might sometime trick even (otherwise somehow smart) developers into making a mistake consider this bug in Qubes we made a long time ago when setting permissions on some xenstore key, which resulted in some information leak (not much of a security problem in general, but still). And just today, Xen.org has published this advisory, that sounds pretty serious, again caused by bad permissions on some xenstore keys. (Yes, we do have updated Xen packages to fix that, of course).
Back to Qubes R3 Alpha, the first successful Qubes based on Odyssey HAL framework. As previously mentioned, we plan to make most of the framework open sourced, specifically all the non-Windows code. However, we're not publishing this Odyssey/R3 code at this moment, mainly for two reasons: 1) we don't want people to immediately start building other backends, such as to support KVM, right at this stage, because we still might want/need to modify some interfaces slightly, e.g. for our vchan, and we don't want to tide our hands now, and 2) the other reason is that we're still in the middle of “Beta” releases for Qubes R2, and we want people to rather focus on testing that, rather stable release, than jumping onto Qubes R3 alpha.
In other news: everybody seems to be genuinely surprised that unencrypted information can be intercepted and analyzed without user consent... Can it be that people will "discover" cryptography now? How many of you use PGP everyday? And how long will it take then to understand that cryptography without secure client devices is useless?
Thursday, March 21, 2013
Qubes OS is becoming more and more advanced, polished, and user friendly OS.
But Qubes OS, even as advanced as it is now, surely have its limitations. Limitations, that for some users might be difficult to accept, and might discourage them from even trying out the OS. One such limitation is lack of 3D graphics support for applications running in AppVMs. Another one is still-far-from-ideal hardware compatibility – a somehow inherent problem for most (all?) Linux-based systems.
There is also one more “limitation” of Qubes OS, particularly difficult to overcome... Namely that it is a standalone Operating System, not an application that could be installed inside the user's existing OS. While installing a new application that increases system's security is a no-brianer for most people, switching to a new, exotic OS, is quite a different story...
Before I discuss how we plan to address those limitations, let's first make a quick digression about what Qubes really is, as many people often get that wrong...
What Qubes IS, and what Qubes IS NOT?
Qubes surely is not Xen! Qubes only uses Xen to create isolated containers – security domains (or zones). Qubes also is not a Linux distribution! Sure, we currently use Fedora 18 as the default template for AppVMs, but at the same time we also support Windows VMs. And while we also use Linux as GUI and admin domain, we could really use something different – e.g. Windows as GUI domain.
So, what is Qubes then? Qubes (note how I've suddenly dropped the OS suffix) is several things:
- The way how to configure, harden, and use the VMM (e.g. Xen) to create isolated security domains, and to minimize overall system TCB.
- Secure GUI virtualization that provides strong gui isolation, while at the same time, provides also seamless integration of all applications running in different VMs onto one common desktop. Plus a customized GUI environment, including trusted Window Manager that provides unspoofable decorations for the applications' windows.
- Secure inter-domain communication and services infrastructure with centrally enforced policy engine. Plus some “core” services built on top of this, such as secure file exchange between domains.
- Various additional services, or “addons”, built on top of Qubes infrastructure, such as Disposable VMs, Split GPG, TorVM, Trusted PDF converter, etc. These are just few examples, as basically the sky is the limit here.
- Various additional customizations to all the guest OSes that run in various domains: GUI, Admin, ServiceVMs, and AppVMs.
Introducing Qubes HAL: Hypervisor Abstraction Layer
Because Qubes is a bunch of technologies and approaches that are mostly independent from the underlying hypervisor, as discussed above, it's quite natural to consider if we could easily build an abstraction layer to allow the use of different VMMs with Qubes, instead of just Xen? Turns out this is not as difficult as we originally thought, and this is exactly the direction we're taking right now with Qubes Odyssey! To make this possible we're going to use the libvirt project.
So, we might imagine Qubes that is based on Hyper-V or even Virtual Box or VMWare Workstation. In the case of the last two Qubes would no longer be a standalone OS, but rather an “application” that one installs on top of an existing OS, such as Windows. The obvious advantage we're gaining here is improved hardware compatibility, and ease of deployment.
And we can go even further and ask: why not use Windows Native Isolation, i.e. mechanisms such as user account separation, process isolation, and ACLs, to implement domain isolation? In other words why not use Windows OS as a kind of “VMM”? This would further dramatically improve then lightness of the system...
Of course the price we pay for all this is progressively degraded security, as e.g. Virtual Box cannot be a match to Xen in terms of security, both architecturally and implementation-wise, and not to mention the quality of isolation provided by the Windows kernel, which is even less.
But on the other hand, it's still better than using “just Windows” which offers essentially only one “zone”, so no domain isolation at all! And if we can get, with minimal effort, most of our Qubes code to work with all those various isolation providers then... why not?
Being able to seamlessly switch between various hypervisors is only part of the story, of course. The remaining part is the support for different OSes used for various Qubes domains. Currently we use Linux, specifically Fedora 18, in our GUI & Admin domain, but there is no fundamental reason why we couldn't use Windows there instead. We discuss this more in-depth in one of the paragraphs below.
The diagram below tries to illustrate the trade-offs between hardware compatibility and ease of deployment vs. security when using different isolation backends with Qubes. Some variants might also offer additional benefits, such as “super-lightness” in terms of CPU and memory resources required, as is the case with Windows Native Isolation.
Some example configurations
Let's now discuss two extreme variants of Qubes – one based on the baremetal Xen hypervisor and the other one based on Windows Native Isolation, so a variant from the opposite end of the spectrum (as shown on the illustration above).
The diagram below shows a configuration that uses a decent baremetal hypervisor, such as Xen, with abilities to securely assign devices to untrusted service domains (NetVM, UsbVM). So, this is very similar to the current Qubes OS.
Additionally we see separate GUI and Admin domains: the GUI domain might perhaps be based on Windows, to provide users with a familiar UI, while the Admin domain, tasked with domain management and policy enforcement, might be based on some minimal Linux distribution.
In the current Qubes OS there is no distinction between a GUI and an Admin domain -- both are hosted within one domain called “dom0”. But in some cases there are benefits of separating the GUI domain from the Admin domain. In a corporate scenario, for example, the Admin domain might be accessible only to the IT department and not to the end user. This way the user would have no way of modifying system-wide policies, and e.g. allowing their “work” domain to suddenly talk to the wild open Internet, or to copy work project files from “work” to “personal” domains (save for the exotic, low-bandwidth covert channels, such as through CPU cache).
The ability to deprivilege networking and USB stacks by assigning corresponding devices (NICs, and USB controllers) to untrusted, or semi-trused, domains provides great security benefits. This automatically prevents various attacks against the bugs in WiFi stacks or USB stacks.
What is not seen on the diagram, but what is typical for baremetal hypervisors is that they are usually much smaller than hosted hypervisors, implementing less services, and delegating most tasks, such as the infamous I/O emulation to (often) unprivileged VMs.
Let's now look at the other extreme example of using Qubes – the diagram below shows an architecture of a “Qubized” Windows system that uses either a hosted VMM, such as Virtual Box or VMWare Workstation, or even the previously mentioned Windows Native Isolation mechanisms, as an isolation provider for domains.
Of course this architecture lacks many benefits discussed above, such as untrusted domains for networking and USB stacks, small hypervisors, etc. But it still can be used to implement multiple security domains, at a much lower “price”: better hardware compatibility, easier deployment, and in case of Windows Native Isolation – excellent performance.
And it really can be made reasonable, although it might require more effort than it might seem at first sight. Take Windows Native Isolation – of course just creating different user accounts to represent different domains is not enough, because Windows still doesn't implement true GUI-level isolation. Nor network isolation. So, there is a challenge to do it right, and “right” in this case means to make the isolation as good as the Windows kernel can isolate processes from different users from each other.
Sure, a single kernel exploit destroys this all, but it's still better than “one application can (legally) read all my files” policy that 99% of all desktop OSes out there essentially implement today.
Now, probably the best thing with all this is that once we implement a product based on, say, Qubes for Windows, together with various cool “addons” that will take advantage of the Qubes services infrastructure, and which shall be product-specific, it should then be really easy to upgrade to another VMM, say Hyper-V to boost security. And the users shall not even notice a change in the UI, save for the performance degradation perhaps (well, clearly automatic creation of VMs to handle various users tasks would be more costly on Hyper-V than with Windows Native Isolation, where “VMs” are just... processes).
Qubes building blocks – implementation details
Let's have a look now at the repository layout for the latest Qubes OS sources – every name listed below represents a separate code repository that corresponds to a logical module, or a building block of a Qubes system:
Because current Qubes R2 still doesn't use HAL layer to support different hypervisors, it can currently be used with only one hypervisor, namely Xen, whose code is provided by the vmm-xen repository (in an ideal world we would be just using vanilla Xen instead of building our own from sources, but in reality we like the ability to build it ourselves, slightly modifying some things).
Once we move towards the Qubes Odyssey architecture (essentially by replacing the hardcoded calls to Xen's management stack, in the core-admin module, with libvirt calls), we could then easily switch Xen for other hypervisors, such as Hyper-V or Virtual Box. In case of Hyper-V we would not have access to the sources of the VMM, of course, so would just be using the stock binaries, although we still might want to maintain the vmm-hyperv repository that could contain various hardening scripts and configuration files for this VMM. Or might not. Also, chances are high that we would be just able to use the stock libvirt drivers for Hyper-V or Virtual Box, so no need for creating core-libvirt-hyperv or core-libvirt-virtualbox backends.
What we will need to provide, is our custom inter-domain communication library for each hypervisor supported. This means we will need to write core-vchan-hyperv or core-vchan-virtualbox. Most (all?) VMMs do provide some kind of API for inter-VM communication (or at least VM-host communication), so the main task of such component is to wrap the VMM-custom mechanism with Qubes-standarized API for vchan (and this standardization is one thing we're currently working on). All in all, in most cases this will be a simple task.
If we, on the other hand, wanted to support an “exotic” VMM, such as the previously mentioned Windows Native Isolation, which is not really a true VMM, then we will need to write our own libvirt backend to support is:
... as well as the corresponding vchan module (which should be especially trivial to write in this case):
Additionally, if we're building a system where the Admin domain is not based on Linux, which would likely be the case if we used Hyper-V, or Virtual Box for Windows, or, especially, Windows Native Isolation, then we should also provide core-admin-windows module, that, among other things, should provide Qubes qrexec implementation, something that is highly OS-dependent.
As can be seen above, we currently only have core-admin-linux, which is understandable as we currently use Linux in Dom0. But the good news is that we only need to write core-admin-XXX once for each OS that is to be supported as an Admin domain, as this code should not be depend on the actual VMM used (thanks to our smart HAL).
Similarly, we also need to assure that our gui-daemon can run on the OS that is to be used as a GUI domain (again, in most cases GUI domain would be the same as Admin domain, but not always). Here the situation is generally much easier because “with just a few #ifdefs” our current GUI daemon should compile and run on most OSes, from Linux/Xorg to Windows and Macs (which is the reason we only have one gui-daemon repository, instead of several gui-daemon-XXX).
Finally we should provide some code that will gather all the components needed for our specific product and package this all into either an installable ISO, if Qubes is to be a standalone OS, like current Qubes, or into an executable, in case Qubes is to be an “application”. The installer, depending on the product, might do some cool things, such as e.g. take current user system and automatically move it into one of Qubes domains.
To summary, these would be the components needed to build “Qubes for Windows” product:core-admin
Additionally we will likely need a few qubes-app-* modules that would implement some "addons", such as perhaps automatic links and documents opening in specific VMs, e.g.:
Here, again, the sky's the limit and this is specifically the area where each vendor can go to great lenghts and build killer apps using our Qubes framework.
Now, if we wanted to create "Qubes for Hyper-V" we would need the following components:
Here, as an example, I also left optional core-agent-linux and gui-agent-linux components (the same that are to be used with Xen-based Qubes OS) to allow support for also Linux-based VMs – if we can get those “for free”, then why not!
It should be striking how many of those components are the same in both of those two cases – essentially the only differences are made by the use of different vmm-* components and, of course, the different installer
It should be also clear now how this framework now enables seamless upgrades from one product (say Qubes for Windows) to another (say Qubes for Hyper-V).
Our business model assumes working with vendors, as opposed to end users, and licensing to them various code modules needed to create products based on Qubes.
All the code that comprises the base foundation needed for creation of any Qubes variant (so core-admin, gui-common, gui-daemon, qubes-builder and qubes-manager) will be kept open source, GPL specifically. Additionally all the code needed for building of Xen-based Qubes OS with Linux-based AppVMs and Linux-based GUI and Admin domains, will continue to be available as open source. This is to ensure Qubes OS R3 that will be based on this framework, can remain fully open source (GPL).
Additionally we plan to double-license this core open source code to vendors who would like to use it in proprietary products and who would not like to be forced, by the GPL license, to share the (modified) sources.
All the other modules, especially those developed to support other VMMs (Hyper-V, Virtual Box, Windows Native Isolation), as well as those to support Windows OS (gui-agent-windows, core-agent-windows, core-admin-windows, etc) will most likely be proprietary and will be available only to vendors who decide to work with us and buy a license.
So, if you want to develop an open source product that uses Qubes framework, then you can freely do that as all the required core components for this will be open sourced. But if you would like to make a proprietary product, then you should buy a license from us. I think this is a pretty fair deal.
Current status and roadmap
We're currently working on two fronts: one is rewriting current Qubes code to support Qubes HAL, while the other one is adding a backend for Windows Native Isolation (which also involves doing things such as GUI isolation right on Windows).
We believe that by implementing two such extreme backends: Xen and Windows Native Isolation we can best show the flexibility of the framework (plus our customer is especially interested in this backend;)
We should be able to publish some code, i.e. the framework together with early Qubes OS R3 that will be based on it, sometime in fall or maybe earlier.
We obviously are determined to further develop Xen-based Qubes OS, because we believe this is the most practically secure OS available today, and we believe such OS should be open source.
Qubes R2 will still be based on the Xen-hardcoded code, because it's close to the final release and we don't want to introduce such drastic changes at this stage. The only thing that Qubes R2 will get in common with Qubes Odyssey is this new source code layout as presented above (but still with hardcoded xl calls and xen-vchan).
So, this is all really exciting and a big thing, let's see if we can change the industry with this :)
Oh, and BTW, some readers might be wondering why the framework was codenamed “Odyssey” -- this is obviously because of the “HAL” which plays a central role here, and which, of course, also brings to mind the famous Kubrick's movie.