« November 2007 | Main | January 2008 »

December 2007 Archives

December 6, 2007

SAFE Act considered harmful

Ug. At it again with making companies acting as police and helping create the dystopia corporate states we all love to hate. This stuff is a really bad idea. I have been meaning to blog about it for a bit. Will try to get to it later today.

December 24, 2007

Simple Consistency

For this discussion we are interested in a database system which can provide a availability guarantee for data reads and writing such that a reader must be able obtain the information supplied by the last successful write with some probability n and a writer must be able to record its data to the database with some other probability n. To achieve this we must guarantee that the probability of critical component failure is less then n for either reads or writes. This can be done either by acquiring components which have a failure rate less then n or by by combining components in a redundant configuration such that the probability of all redundant components failing is less then  n. This article concerns it self with the second approach and the issues that arise from a system where one redundantly stores data on more then one distinct node in a cluster with to goal of achieving a low probability of failure.

With a data replication approach to availability we are bought the the primary concern of this work, detecting and handling error states which cause data to become inconsistent across nodes replicating data. Inconsistencies in replication create ambiguity in the definition of a successful wright and obfuscate the value of the last wright.

There are three different errors which can endanger consistency of writes:
  1. Partial write: The write fails on one or more nodes.
  2. Corrupt write : The data becomes corrupt on one or more nodes.
  3. Out of order write: Two or more writes are applied to the same datum but in a inconsistent order across two or more nodes.
Similarly there are a related but distinct set of errors that can occur during reads:
  1. Corrupt read : The data becomes corrupt on transmission to the reader.
  2. Write read race: A reader performs a read while a write is occurring.
All these errors are indifferentiable to the reader and all present the error state where the values read are inconsistent. In the case that any of these errors occur if one were to perform two reads of the same datum but from different replicants it is possible to get different results. Similarly, if one were to perform a read for a single datum across all replicants one would get back a inconsistent set of answers. In response we can either attempt to avoid these error states or we can define write and read success so that it is consistent under all these cases. In this article we chose the later approach and introduce idea of voting which will allow us to correct inconsistencies up to a configurable limit.

When voting is used to decide on a consistent version of a datum the client must query replants until a majority of the total number of replicants agree on the value of datum. This value will then be considered the value for the query.  This method can correct n/2 - 1 inconsistencies where n is the number of replicas of the data. If more errors have occurred then a consistent read is impossible.

Write failures can largely be ignored in this approach, as they will merely cause one out of n reads to fail which is acceptable as long as it is still possible for n/2 - 1 reads to succeed.

This outlines the minimal requirements for the consistency model. This article will be followed by more with some enhancements to this scheme which improve on it by providing methods for reducing the number of states where data can be inconsistent ( useful for high concurrency datums ), providing  methods for repairing datums where state has become inconsistent, and describing  optimizations the the performance of a database using this method of replication consistency. 

Vectorized Classic Video Games


mario.png



Google Blogoscoped has a great gallery of graphics that are the result of taking old 8-bit and 16-bit game graphics vectorizing them with VectorMagic and scaling them up. They range from creative to abstract and they are all fun.

link

December 25, 2007

Why Lame?

Why is it that every worm for social networks is lame; written so that it has to link back to some site to get its content for infection? Would it be so hard for the author to just write it as a quine, it's not like there isn't lots of examples code. ( I know I should not wish that attackers did a better job but I really don't like to see work half done. )

Corporate Police and Due Process

We are entering a time when it is becoming clear that corporations are a means to sidestep the judicial process. Today, and to a lesser extent in the past, companies are put into the position of adjudicating decisions of legal concern regarding peoples' actions within their domain of control. This post will use Flickr and issues around its Safe Search for our examples; but the symptoms described are common.

Safe Search is a feature meant to partition the photos uploaded to Flickr into three groups which they describe as follows:

1. Safety Level
  • Safe - Content suitable for a global, public audience
  • Moderate - If you're not sure whether your content is suitable for a global, public audience but you think that it doesn't need to be restricted per se, this category is for you
  • Restricted - This is content you probably wouldn't show to your mum, and definitely shouldn't be seen by kids
Where there are many reasons for the rating system the one we will concern ourselves with is that of compliance with the laws of Singapore, Hong Kong, Korea, and Germany. In these countries people's ability to view content is limited:

Note: If your Yahoo! ID is based in Singapore, Hong Kong or Korea you will only be able to view safe content based on your local Terms of Service so won't be able to turn SafeSearch off. If your Yahoo! ID is based in Germany you are not able to view restricted content due to your local Terms of Service.

The Register reports that the restrictions in Germany are due to "stricter legislation and penalties in that country". Here we see Safety Level being used to enforce law, or at least Yahoo's interpretation of German law. Moreover, to facilitate enforcement of the law, Flickr is compelled to enforce a policy that all users, not just those in restricted countries, accurately label photos. Flickr's policy enforcement is where things become problematic.

If we take that we can decompose governance it to Legislative, Judicial, and Executive components, Flickr is acting in both Judicial, and Executive roles with only the Legislative role left to the government of the German people. The result being that Germans are denied the procedures and laws of their country meant to ensure that law is applied justly. ( If this were not bad enough, these rules are applied to all Flickr users not just German ones. )

The process that befalls a user that mislabels a photo under the watch of Flickr is as follows:

1. An anonymous person flags a photo as under-moderated:

Staff hear about this sort of thing because your fellow members can flag photos around the site if they feel that you have categorized things incorrectly, or they may even send a report to us that some of your content is offensive.

2. An anonymous Flickr employee makes a judgment about the flagged photo.

3. If the employee finds the photo to be under-moderated, the entire account hosting the photo is forcibly moderated to a level of the employee's choosing. A second offense can end in account deletion.

In this process the user under judgment is subject to a secret trial with secret evidence initiated by a nameless accuser. The accused does not even know they have been tried unless they are found guilty. They are not informed which act of uploading was their crime; and punished though a ghettoization from which they have very little ability to appeal. ( One can request a re-review but they don't tell users which photos were offensive so it can be hard to correct one's actions. )

Some would argue that this issue will be temporary as the market will correct; but I think that is naive given that social applications exhibit network effect making it hard for people to leave. We need to have a discussion about how we should address this as a culture. What are the boundaries how lawmakers can ask companies to act as enforcers of the law? How will due processes evolve?

CSS Side Channel Attack

So there is this older, known hack where you can walk the DOM and discover if a link on a page has been visited:

https://bugzilla.mozilla.org/show_bug.cgi?id=147777

This has all sorts of uses as a vector for phishing sites and XSS attacks. But what I just realized is that sites can conspire to use the visited attribute for a communication back channel.

If one site wants to communicate a 32bit number, say a user id, with another site covertly it can use the visited attribute. The method is as follows:

The transmitting site will create 32 URLs with names like foo.com/bit_1.html throu foo.com/bit_2.html. Then in some page the sender takes the number they want to transmit and for each bit that is true, they embed a hidden frame with the URL of that bit. After this page has been viewed by the subject of the attack, any other page on the internet which knows the bit URLs can test to see if those URLs have been visited and reconstruct the number that was set.

There are two main shortcomings of this approach that are clear. First, the stored value only persists as long as the visited link info lasts in the browser, which I think is nine days by default in FF as far as I can tell. Secondly, there is no way to unset bits. To deal with this, one could add a sequence to the bit urls:

foo.com/sequence_1.html
foo.com/bit_1_1.html

foo.com/sequence_2.html
foo.com/bit_1_2.html

Now before you want to set a number you walk up the sequence URLs 'till you find one that is not set then you set the bits assorted with that sequence.

This would basically allow one to have cookies that can be read from any page on the internet with the added feature that there is no network IO required to read the data; once someone has had the value set it can even be read even offline.

December 27, 2007

The 5 minute rule

There is a great paper every computer engineer should read, The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time:

Abstract

If an item is accessed frequently enough, it should be main memory resident. For current technology, "frequently enough" means about every five minutes. Along a similar vein, one can frequently trade memory space for CPU time. For example, bits can be packed in a byte at the expense of extra instructions to extract the bits. It makes economic sense to spend ten bytes of main memory to save one instruction per second. These results depend on current price ratios of processors, memory and disc accesses. These ratios are changing and hence the constants in the rules are changing.

Published by Tandem in 1986 it is still relevant today. Quick and fun, the paper provides a analytical tool that is powerful beyond it's original scope. It has been updated twice, ten years later (1997), and twenty years later (2007). The ten year paper show how the rule still stood at that time. The twenty year later paper provides new insights in to the topic, has lots of good analytical work, and should be read if you are trying to apply the 5 minute rule to modern systems.

If I was teaching computer science The 5 Minute Rule would be required reading.

About December 2007

This page contains all entries posted to 0x0000 in December 2007. They are listed from oldest to newest.

November 2007 is the previous archive.

January 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.33