Thoughts on Security

Posted by Uiri on Aug 31 2012

Stripe ran a "Capture the Flag" challenge which ran from wednesday last week until a couple days ago. The challenge was, essentially, a series of nine web applications whose security had to be broken. Anyone who managed to break them all would have a free special edition Stripe t-shirt sent to them and would be shown in the leaderboard. The potential benefits that Stripe gets out of this are varied and Stripe might just set these up because they're fun to setup and fun to solve (they're probably also looking for potential hires).

I started as soon as I heard about it — so sometime in the afternoon on the 22nd and I finished at 1 or 2 am Friday night. I finished in the 150s or so and almost 1000 people ended up finishing it before the week was up. The vulnerabilities I used were SQL injections, some stupid vulnerabilities in some PHP code, XSS which was really HTML or javascript injection because an external script wouldn't work, exploiting the fact that Rack's params accept both GET and POST parameters, a SHA1 extension attack and a side channel attack involving port numbers.

Clearly, protection against SQLi and XSS is relatively straightforward. Use parameterized queries, don't trust user input, escape stuff which is relevant for where the data will be placed, etc. The same goes with the dumb PHP vulnerabilities. Don't use stuff like extract, check the filetype of uploads, use proper filesystem permissions, etc. The attacks used on the penultimate and final levels are much trickier to protect against, however.

In order to protect against a SHA1 extension attack you have to know that it exists. If this kind of vulnerability is only discovered after the code has been written — many years after when the code is a distant memory — it becomes a matter which is more than switching out one form of crypto with another. There are clients which will still be using the old method with SHA1. This kind of vulnerability specifically seems unlikely to be discovered randomly, but it is much harder to predict how easily a certain hash will be bruteforced several years after the code is written — it is also much harder to predict the possibility of collision attacks or the existence of large rainbow tables due to the popularity of a certain hash algorithm.

The vulnerability in level 8 had to do with the port number that an authentication server used to communicate with a webhook which would presumably handle the response from the authentication server. Basically the source port needs to be randomized at what I assume would be the TCP stack level. Essentially, your operating system needs to be hardened against source port prediction in order to protect against this vulnerability.

It seems that there is no such thing as a 100% secure system. Bruteforcing will always get easier, side channel attacks may or may not exist in your code and cryptography will continue its march to better algorithms. Here is hoping that your system can be considered "secure enough".

This work by UIRI NOYB is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.

Self hosting stuff

Posted by Uiri on Aug 10 2012

The cloud is irrelevant in a world where everyone hosts everything that they need themselves. If sending an file to a webserver you control and getting the link back were a two click operation, there would never be a need to email files to oneself or to email files to other people. It would be as simple as passing around a url. DropBox's public folders would surely look less appetizing as their URLs are far from memorizable. When one controls the webserver they are sending the file to though, they can buy whatever nice domain name they want and make the link very memorizable. Either a simple sequence of words (punctuated by a .com or .ca) or a few short letters and a filename.

Hosting one's own webserver requires either time and some level of comfort with technology or money. Some people who don't need to will spend money on a webhost in order to save themselves time. Hosting a box out of your house is ideal because it is possible to run other convenient services on the same box. As it stands, with a webhost you have to use ftp to upload a file which isn't the best of protocols. If one can use ssh to access the webserver though, they can use a simple scp command. Both scp and a ftp client are a far cry from a two click operation though.

Something as nontrivial as — essentially — setting up a website so that you can share files with friends is going to scare away the vast majority of users for whom the two click operation is the only thing that can beat sending an email attachment. This is part of the reason why dropbox exists. They provide the web infrastructure and make it as simple as dragging a file onto a folder. Dropbox's sharing URLs are far from memorable. The nastiest part is, of course, the user id number. Even if it were usernames instead of numbers, how many people would put in instead of Without the /u/ between the end of the domain name and the user-specific part, dropbox will 404 although it will successfully redirect www to dl when appropriate.

I wonder what other services have made ugly hacks like sending large files as email attachments instead of web links obsolete. I am certain that nontechnical users have run into the situation where they used a desktop email client and POP and deleted all of their email off of their email server and then changed computers. Webmail makes this unfortunate series of blunders essentially impossible. It is just as easily avoided with a checkbox in settings but no one actually looks in settings anyways.

I am sure that there are as many ways to use email as there are people and everyone has different preferences. The ways that webmail users use email are constrained by the interface that they are using. Technical users can swap out IMAP servers, POP servers, email clients, mail transfer agents, filtering tools, spam fighting tools and a host of other things in order to arrive at their optimal setup. All of these tools are necessary if one wants to keep all of their email only in their own home and on other people's servers for as short a time as possible and one wants to not have a gigantic hassle when they add or remove a computer that they sit at from their network. Nontechnical users essentially have to compromise one of the two.

I suppose that if the FINGER protocol made a come back (or webfinger actually took off), I could make an argument about social networking. The internet has become a much more paranoid place since the opening of the internet to people like me and the dominance of Microsoft. Security through obscurity works when there are no good assumptions about the vast majority of targets. Advertising personal details to anyone on the internet who asks for them is rightfully treated as a bad thing today. So the FINGER protocol won't make a comeback and advertising stuff about an email address is likely to be similarly bad so webfinger's prospects seem to me to be uncertain.

This is the ultimate power of the cloud. To make sacrificing privacy an inherent part of easing internet communications. What are your personal files? Dropbox wants to know. What are your personal emails? Google wants to know. What are your interests and who are your friends? Facebook wants to know. Ultimately it is possible to use email and share files easily without sharing private emails and drawing a corporation's attention to those files that you want to share. It should be possible to communicate in the ways that facebook facilitates without compromising privacy but the network effects of Facebook's silo may be too strong. Only on open protocols like SMTP and HTTP can you have true freedom from the companies which make stuff two click operations.

This work by UIRI NOYB is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.

Java sucks

Posted by Uiri on Jul 20 2012

Java sucks, most of all, as a first programming language. There are some features of the language which make it suck as a first language but what makes it worse is something which can happen to any language if it is popular enough. It is perfectly possible to learn Java as a first language and never go on to learn any other programming languages. Not always learning new programming languages kills. Simply constantly learning about new technology is beneficial in and of itself even if one never goes on to make a single penny from that knowledge.

The sad part is that it is possible and it happens through no fault of the developer himself (or herself). Learn Java in High School. Learn Java at University. Get a cubicle developer job — writing Java. This is, of course, conjecture. But I have no doubt that this kind of career path is possible. Or was possible. I have no doubt that jobs where code quality is of secondary importance will be outsourced within a decade or two.

I just made a subtle jab at Java there. You might not have noticed it. But it is probably an unfair characterisation to say that there is no one paying for quality Java code. At least Google likely has some Java code and they expect and likely enforce good code quality. I doubt that Google engineers — even if writing Java is their day job — ignore other languages and new technology though.

So the ecosystem around programming in general and Java in particular is somewhat depressing in that code quality outside of technology companies is generally a secondary concern and is generally in Java, .NET or in the rare cases where a popular CMS is involved, PHP. I have never touched C# (and by extension .NET) but funny enough, I have not read anything negative about C# which didn't apply to .NET or the Microsoft ecosystem in general. It might be that C# is actually a decent language or it might be that Microsoft's ecosystem is insular as Apple's. I've heard that Objective-C looks funny but that's about it. I'm reluctant to touch the Microsoft ecosystem so it is unlikely I will ever learn C#. I have written a good bit of PHP and small amounts of Java, though.

Now, consider the following introduction to programming.

class HelloWorldProgram {
    public static void main(String[] args) {
        System.out.println("Hello World");

The obvious question is "What is a class?" If someone is writing their first program, they should not have to know what a class is. In order to hack together some neat little programs which top out at 50 or 100 lines and are easily contained in a single file, there is absolutely no reason to bother them with the notions of a class. Now, if this question is ignored, the beginner programmer will have to form some kind of idea about what a class is. Even if class is just some kind of magic voodo word that they put at the top of their programs, it is unlikely that their idea of what a class is will match up perfectly with the Object Oriented Programming idea of what a class is.

Ideally, one doesn't teach object oriented programming until procedural programming and concepts like subroutines, variables, syntax, data types, etc. have been talked about and are well understood by the student(s). Java throws in this class keyword and now the students have to fight against whatever, conscious or unconscious and most likely incorrect, idea of what class meant they had before the teacher talked about object oriented programming (OOP).

If the teacher does introduce OOP first, and things like clean interfaces and encapsulation, it will encourage overengineering of code. This is not always a bad thing but for programs which are quickly hacked together and do small, neat things, it is essentially a drag on the student. Beyond overengineering problems, the intuitive concept of giving a set of instructions to a machine to execute in order (procedural programming) can get mixed up with OOP concept because they only ever wrote a program which started its execution point inside of an object's main method.

The teacher also has to get ahead of themselves, basically, in order to properly teach OOP first. They have to talk about an object having methods (or functions or subroutines) which are called on the object and manipulate the state of the object. The best use for methods in smallish programs is to follow the Don't Repeat Yourself principle. The student needs to understand methods in order to write any nontrivial program.

It is possible to introduce the concepts of objects and method calls early, but it is hard to introduce the concepts of writing these kinds of things because they are overkill for something like Hello World above, or even FizzBuzz. Of course most programs have calls to libraries (whether explicitly included or included as part of the language) even the hello world calls the println method on the PrintStream object out which is a public variable in the System class (or something like that). That is intuitive, or at least, the black box call which puts stuff on my screen.

The problems can be even more serious than that. Even though it is obvious from various system calls that there is more to a class than a "main" method, if the subject of OOP is not brought up, and the subject of subroutines are not brought up, a programmer can be frustrated working in essentially a purely procedural environment surrounded by Java's nice OOP wrapping. Or I could be wrong to blame Java for this kind of thing and programming courses at the high school level are simply broken.

I have more to object to about Java than simply its enforcement of OOP on programs not meant or intended to be OOP and the effects that this has on beginner programmers and its popularity (like primitive data types instead of objects which represent them) but I think that this post is already nearly double my usual length and so I'll end here. Maybe if Java was more like C than C++ I'd dislike it less.

This work by UIRI NOYB is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.

Office suites are dead to me

Posted by Uiri on Jul 6 2012

The concept of the office suite is interesting. There is a word processor for composing documents, a slideshow program for presentations, a spreadsheet program for... manipulating tabular data. Microsoft Office has Word, Powerpoint and Excel while LibreOffice (and has Writer, Impress and Calc. I don't see any particular need for any of these three programs any more though. I never used spreadsheets very much. The fanciest thing I've used spreadsheets for is to simulate sports competitions. I may do a presentation in Impress or Prezi — but if I have the choice, it is better to use a web page with some html5, css3 and javascript magic. LaTeX is a very strong replacement for a word processor for me though. I doubt I'll ever go back to WYSIWYG document creation.

LaTeX is in itself interesting. It is hard to describe accurately because it is really only one out of a group of programs related to TeX. At its core, it is markup language for documents. Its tools have results with superior typesetting when compared to word processors. The WYSIWYG model competes by being a lot less geeky. LaTeX wins out for me precisely because its format is plain text like everything else that I like to handle. All my code and blog posts are ultimately text. My blog posts are html and my code is in whatever language but I use emacs for both. And I can use emacs for LaTeX too.

I say emacs, but what I really mean is my text editor of choice. The editor isn't the important part, the important part is the permanence. Microsoft Word files which are more than 15 years old can't be read by Microsoft Word 2007. This is utterly ridiculous. TeX is essentially set in stone because it was a means for Donald Knuth to have his books typeset properly. It has been frozen since 1989 which means that files as old as 23 years, likely even older, can still be compiled from their .tex files to .pdf or whatever other printable format, and compile correctly and be printable.

LaTeX may be clearly superior but I don't expect its adoption to pick up any time soon. Microsoft's Office Suite has network effects which are very strong. Most people think in terms of the office suite model — that's the raison d'être of LibreOffice (and of course. I may end up using Calc or Excel at some point in the future but it is much more likely that I'll code a solution to my problem using a scripting language and a library like numpy. It is unlikely I will be able to convince someone to collaborate with me on a web page based presentation so use of Impress, Prezi or Powerpoint is likely unavoidable. The office suite has, for me, become permanently decoupled.

Prezi will contribute to people who don't use text editors regularly using Powerpoint less. Trello is a web app which is basically list management and moving items between lists and — if its creators are correct about their assumption of Excel's most popular use — it may mean that people use Excel less and less. If there is a web app which replaces the primary use case(s) of Word, we may start to see a decline in Office sales as people realize they aren't using it very much.

This work by UIRI NOYB is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.

Data Loss and Backups

Posted by Uiri on Jun 8 2012

If you have ever had a hard drive fail on you, you know how stressful it can be. Thankfully I haven't had that experience. It is rather tricky to get data off of a hard drive which is busted without special tools. Hard drives work mainly by having a head read off of magnetic platters. The platters are where the actual data is stored and the head can read the magnetization of a small section of the hard drive as a bit. If the platters become exposed to air you will have to prioritize which data to recover first because the dust in air will quickly ruin the platters forever.

This is the main scenario which backups are meant to prevent — hardware failure causing serious data loss. Of course there are other things which can cause data loss. Filesystem corruption may leave the disk accessible by the computer but without a way tell which parts of which platter are which file. A mistyped rm or dd command can cause the OS to start overwriting or marking as empty certain parts of the disk. Rm is recoverable in most cases while dd usually is not. That's the difference between ReMoving and Data Description (the latter writes data from one source to another, converting it as appropriate).

If your hard drive gets wrecked in a flood or fire or even gets stolen that can also cause you to lose access to your data. Especially if it is stolen, there is no chance of recovery because the physical drive is gone; past a point of no return. If you are subject to a virus (either due to a id ten t error or using a Microsoft product or both) it may just scribble all over your hard drive or worse. If the hard drive is dropped while spinning, it isn't possible to predict whether or not it will survive.

All of these scenarios are mitigated by keeping proper backups. The most basic backup is to the same hard drive. This way if the computer accidentally deletes forever or overwrites the original, you can copy over the backups. If you backup to another partition on the same drive it will also save you from filesystem corruption and an accidental dd onto all of the main partition. This kind of backup only saves you in the least severe of data loss causes.

A step up is much more sensible given that the drive itself failing is the most likely thing to take your data which is backing up a secondary hard drive. If you keep this hard drive close at hand, it is a lot easier to keep backups recent. Keeping the hard drive close will also subject it to the same likelihood of fire and flood as your main hard drive. Moving it to another location is smart but you need to make changes to your backups at least once a month (unless you're willing to lose multiple months' worth of data) and you need somewhere else to move it to.

The ultimate solution to this is an on-site and off-site backup strategy which is likely overkill for personal backups but likely to be standard for corporate backups. Corporations are also likely to use magnetic tape over hard drives for reliability reasons. No backups is a recipe for disaster so make sure you do something to protect yourself against data loss.

This work by UIRI NOYB is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License.