July 26, 2008

Cloud storage unreliable?

There is some press about a outage of Amazon's S3 cloud storage service. ars technica raises the point:

Fundamentally, cloud storage that centralizes functionality can also centralize failures in functionality, making cloud services vulnerable to widespread outages.

But centralization and failure are not unique to cloud storage. CDNs, colo facilities, and hosting providers, all can be single points of failure. I can recall a outage at 365 main in SF that bought down most of the "web 2.0". Economies of scale dictate that we will have large central resources. It is up to good application designers to build with the expectation that services will fail.

May 25, 2008

Powers of 10

Classic science short about the scale of the universe.

May 17, 2008

Context for the surveillance debate

The current debate about surveillance turns out not to be a feature of September 11th, instead being born of the move to mobile telephony and digital switching that started in the 1980's. govexec.com has a thorough article that gives the history of the debate here. It is well worth a read if you have the time.

May 5, 2008

Waiting at the door

IMG_1798

When running in the underground portions of the muni metro system, the trains are controlled by computer and because of this, they stop at the same place along the platform every time. Unlike bart, which is similarly controlled by computer, the spots where the doors will open are not marked out. Even without these markings it is still possible to tell where one will need to board. As people enter and leave the trains, their feet wipe clean the dirt form the very edge of the train platform leaving clean marks at each door opening. By observing and counting off the marks in sets of four it is possible to tell where to stand to enter at a particular door on a particular train.

January 24, 2008

Changing the world by treating people as humans

In this TED video Bill Strickland talks about how he is changing the world though education and compassion. I quote the description of the video below but I don't think it does it justice. I strongly recommend you watch the video.

With subtle accompaniment by longtime friend Herbie Hancock, and a slide show that has opened the minds (and pocketbooks) of CEOs across the country, Bill Strickland tells a quiet and astonishing tale of redemption through arts, music and unlikely partnerships.

December 27, 2007

The 5 minute rule

There is a great paper every computer engineer should read, The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time:

Abstract

If an item is accessed frequently enough, it should be main memory resident. For current technology, "frequently enough" means about every five minutes. Along a similar vein, one can frequently trade memory space for CPU time. For example, bits can be packed in a byte at the expense of extra instructions to extract the bits. It makes economic sense to spend ten bytes of main memory to save one instruction per second. These results depend on current price ratios of processors, memory and disc accesses. These ratios are changing and hence the constants in the rules are changing.

Published by Tandem in 1986 it is still relevant today. Quick and fun, the paper provides a analytical tool that is powerful beyond it's original scope. It has been updated twice, ten years later (1997), and twenty years later (2007). The ten year paper show how the rule still stood at that time. The twenty year later paper provides new insights in to the topic, has lots of good analytical work, and should be read if you are trying to apply the 5 minute rule to modern systems.

If I was teaching computer science The 5 Minute Rule would be required reading.

December 25, 2007

CSS Side Channel Attack

So there is this older, known hack where you can walk the DOM and discover if a link on a page has been visited:

https://bugzilla.mozilla.org/show_bug.cgi?id=147777

This has all sorts of uses as a vector for phishing sites and XSS attacks. But what I just realized is that sites can conspire to use the visited attribute for a communication back channel.

If one site wants to communicate a 32bit number, say a user id, with another site covertly it can use the visited attribute. The method is as follows:

The transmitting site will create 32 URLs with names like foo.com/bit_1.html throu foo.com/bit_2.html. Then in some page the sender takes the number they want to transmit and for each bit that is true, they embed a hidden frame with the URL of that bit. After this page has been viewed by the subject of the attack, any other page on the internet which knows the bit URLs can test to see if those URLs have been visited and reconstruct the number that was set.

There are two main shortcomings of this approach that are clear. First, the stored value only persists as long as the visited link info lasts in the browser, which I think is nine days by default in FF as far as I can tell. Secondly, there is no way to unset bits. To deal with this, one could add a sequence to the bit urls:

foo.com/sequence_1.html
foo.com/bit_1_1.html

foo.com/sequence_2.html
foo.com/bit_1_2.html

Now before you want to set a number you walk up the sequence URLs 'till you find one that is not set then you set the bits assorted with that sequence.

This would basically allow one to have cookies that can be read from any page on the internet with the added feature that there is no network IO required to read the data; once someone has had the value set it can even be read even offline.

Corporate Police and Due Process

We are entering a time when it is becoming clear that corporations are a means to sidestep the judicial process. Today, and to a lesser extent in the past, companies are put into the position of adjudicating decisions of legal concern regarding peoples' actions within their domain of control. This post will use Flickr and issues around its Safe Search for our examples; but the symptoms described are common.

Safe Search is a feature meant to partition the photos uploaded to Flickr into three groups which they describe as follows:

1. Safety Level
  • Safe - Content suitable for a global, public audience
  • Moderate - If you're not sure whether your content is suitable for a global, public audience but you think that it doesn't need to be restricted per se, this category is for you
  • Restricted - This is content you probably wouldn't show to your mum, and definitely shouldn't be seen by kids
Where there are many reasons for the rating system the one we will concern ourselves with is that of compliance with the laws of Singapore, Hong Kong, Korea, and Germany. In these countries people's ability to view content is limited:

Note: If your Yahoo! ID is based in Singapore, Hong Kong or Korea you will only be able to view safe content based on your local Terms of Service so won't be able to turn SafeSearch off. If your Yahoo! ID is based in Germany you are not able to view restricted content due to your local Terms of Service.

The Register reports that the restrictions in Germany are due to "stricter legislation and penalties in that country". Here we see Safety Level being used to enforce law, or at least Yahoo's interpretation of German law. Moreover, to facilitate enforcement of the law, Flickr is compelled to enforce a policy that all users, not just those in restricted countries, accurately label photos. Flickr's policy enforcement is where things become problematic.

If we take that we can decompose governance it to Legislative, Judicial, and Executive components, Flickr is acting in both Judicial, and Executive roles with only the Legislative role left to the government of the German people. The result being that Germans are denied the procedures and laws of their country meant to ensure that law is applied justly. ( If this were not bad enough, these rules are applied to all Flickr users not just German ones. )

The process that befalls a user that mislabels a photo under the watch of Flickr is as follows:

1. An anonymous person flags a photo as under-moderated:

Staff hear about this sort of thing because your fellow members can flag photos around the site if they feel that you have categorized things incorrectly, or they may even send a report to us that some of your content is offensive.

2. An anonymous Flickr employee makes a judgment about the flagged photo.

3. If the employee finds the photo to be under-moderated, the entire account hosting the photo is forcibly moderated to a level of the employee's choosing. A second offense can end in account deletion.

In this process the user under judgment is subject to a secret trial with secret evidence initiated by a nameless accuser. The accused does not even know they have been tried unless they are found guilty. They are not informed which act of uploading was their crime; and punished though a ghettoization from which they have very little ability to appeal. ( One can request a re-review but they don't tell users which photos were offensive so it can be hard to correct one's actions. )

Some would argue that this issue will be temporary as the market will correct; but I think that is naive given that social applications exhibit network effect making it hard for people to leave. We need to have a discussion about how we should address this as a culture. What are the boundaries how lawmakers can ask companies to act as enforcers of the law? How will due processes evolve?

Why Lame?

Why is it that every worm for social networks is lame; written so that it has to link back to some site to get its content for infection? Would it be so hard for the author to just write it as a quine, it's not like there isn't lots of examples code. ( I know I should not wish that attackers did a better job but I really don't like to see work half done. )

December 24, 2007

Vectorized Classic Video Games


mario.png



Google Blogoscoped has a great gallery of graphics that are the result of taking old 8-bit and 16-bit game graphics vectorizing them with VectorMagic and scaling them up. They range from creative to abstract and they are all fun.

link

Simple Consistency

For this discussion we are interested in a database system which can provide a availability guarantee for data reads and writing such that a reader must be able obtain the information supplied by the last successful write with some probability n and a writer must be able to record its data to the database with some other probability n. To achieve this we must guarantee that the probability of critical component failure is less then n for either reads or writes. This can be done either by acquiring components which have a failure rate less then n or by by combining components in a redundant configuration such that the probability of all redundant components failing is less then  n. This article concerns it self with the second approach and the issues that arise from a system where one redundantly stores data on more then one distinct node in a cluster with to goal of achieving a low probability of failure.

With a data replication approach to availability we are bought the the primary concern of this work, detecting and handling error states which cause data to become inconsistent across nodes replicating data. Inconsistencies in replication create ambiguity in the definition of a successful wright and obfuscate the value of the last wright.

There are three different errors which can endanger consistency of writes:
  1. Partial write: The write fails on one or more nodes.
  2. Corrupt write : The data becomes corrupt on one or more nodes.
  3. Out of order write: Two or more writes are applied to the same datum but in a inconsistent order across two or more nodes.
Similarly there are a related but distinct set of errors that can occur during reads:
  1. Corrupt read : The data becomes corrupt on transmission to the reader.
  2. Write read race: A reader performs a read while a write is occurring.
All these errors are indifferentiable to the reader and all present the error state where the values read are inconsistent. In the case that any of these errors occur if one were to perform two reads of the same datum but from different replicants it is possible to get different results. Similarly, if one were to perform a read for a single datum across all replicants one would get back a inconsistent set of answers. In response we can either attempt to avoid these error states or we can define write and read success so that it is consistent under all these cases. In this article we chose the later approach and introduce idea of voting which will allow us to correct inconsistencies up to a configurable limit.

When voting is used to decide on a consistent version of a datum the client must query replants until a majority of the total number of replicants agree on the value of datum. This value will then be considered the value for the query.  This method can correct n/2 - 1 inconsistencies where n is the number of replicas of the data. If more errors have occurred then a consistent read is impossible.

Write failures can largely be ignored in this approach, as they will merely cause one out of n reads to fail which is acceptable as long as it is still possible for n/2 - 1 reads to succeed.

This outlines the minimal requirements for the consistency model. This article will be followed by more with some enhancements to this scheme which improve on it by providing methods for reducing the number of states where data can be inconsistent ( useful for high concurrency datums ), providing  methods for repairing datums where state has become inconsistent, and describing  optimizations the the performance of a database using this method of replication consistency. 

December 6, 2007

SAFE Act considered harmful

Ug. At it again with making companies acting as police and helping create the dystopia corporate states we all love to hate. This stuff is a really bad idea. I have been meaning to blog about it for a bit. Will try to get to it later today.

November 28, 2007

Scaling Reads: Replication vs Partitioning

The common method of read scaling is to copy read heavy data to more nodes. This increases the available read throughput of a datum by increasing the total amount of resources dedicated to serving it. This will always be necessary when the read load for a single datum surpasses the resources of a single node (IO, CPU, Network, etc). This method scales well but has the down side that it is often not efficient with respect to storage space. This is not so often a issue for data on disk but is a often issue for data cached in memory. The following example demonstrates the difference:

Let us posit that there two nodes { node1, node2 } and a data set having four datums, { A, B, C, D } with the load distribution <A:40%, B:40%, C:10%, D:10%>. To scale reads one could fully replicate the datums across both nodes, node1:{ A, B, C, D }, node2:{ A, B, C, D }. We would then distribute the query load evenly between the two nodes so they were each handling one half of the total load for the data. A second option would be to share the load between the two nodes by partitioning the data across the nodes. One such repartitioning is node1:{ A, C }, node2:{ B, D } resulting in the load distribution of  node1:<A:40%, C:10%>, node2:<B:40%, D:10%>.  With this partitioning the read load is still evenly distributed across the nodes but only half as much data is stored at each node.

Even though this example appears to assumes that reliable load data is available for each datum it is possible to achieve a even distribution of load by methods such as repartitioning with only node level knowledge.  One such method would periodically randomly select a datum from node a with above average load and move it to a node with a below average load. Such a scheme should lead to a balanced cluster over time. ( This scheme will likely be discussed in more detail in a later post. )

The cost of using partitioning as a read scaling method is the added complexity of creating the partitions on the data and maintaining the routing information required to deliver queries to the appropriate node.

Tags:

Flashless

So I have been long frustrated with the fact that my camera's flash kind of sucks but that unassisted in low light it is nigh impossible (at least if your subject is breathing). So I had this idea that I could solve the problem and create some fun shots at the same time by using a flashlight to illuminate my subject. Here are the results of my first experiment with the idea.

Party like it's 1989

In an article on New York Magazine we learn that the music industry was so tech phobic that they did not even try to deal with digital music music on the internet.

"""
Even though we shouldn't be, we're actually a little shocked. We'd always assumed the labels had met with a team of technology experts in the late nineties and ignored their advice, but it turns out they never even got that far -- they didn't even try! Understanding the Internet certainly isn't easy -- especially for an industry run by a bunch of technology-averse sexagenarians -- but it's definitely not impossible. The original Napster hit its peak in 1999 -- kids born since then have hacked into CIA computers. Surely it wouldn't have taken someone at Universal more than a month or two to learn enough about the Internet to know who to call to answer a few questions. They didn't even have any geeky interns? We give this industry six months to live.
"""

link

November 27, 2007

http://www.epochconverter.com

I find it wonderful and amazing that someone would decide to create a page dedicated to epoch time conversions. http://www.epochconverter.com

November 23, 2007

Untitled

IMG_1392.JPG O' HAI I has cuddle.

A really big machine

I have been looking for a new project to work on. Last Saturday I went to a party on the USS Jeremiah O'Brien. The engine room was amazing they have an all-volunteer staff and I am thinking of seeing if I can help out on the engine. It is a four story closed loop stem engine that uses see water to recondense the steam back into water before it goes back into the boiler. It looks like a whole lot of greasy fun to me. I have some more photos of it here.

May 16, 2007

SRL: Getting Ready for Maker Faire

IMG_1370

IMG_1173

IMG_1199

IMG_1264

Work on machines and props for Maker Faire has been steady; I've been in the SRL shop almost every night for over a week. Making parts for the Bombloader-Mr. Satan hybrid machine, doing finish work on the solid steel Satan head itself, assisting Mark Pauline whenever needed, and a variety of other parts-making odds and ends that has me on a lathe, scavenging metal stock from the racks, and plenty of time in front of the big Marvel saw.

March 22, 2007

Significantly Advanced

I was chatting with my friend Kevin and he said this great thing:


I guess if any significantly advanced technology is indistinguishable from magic then the engineers are indistinguishable from wizards

I the first part is quoting Clarke's three laws. I would word it just a little different:

If any significantly advanced technology is indistinguishable from magic then significantly advanced engineers must be indistinguishable from wizards.

Let the pompous nerdyness begin.

March 9, 2007

How to get hurt

I was discussing relationships with the lovely Violet Blue and we came up with the rules for the getting hurt game. Here is our list:


  1. Don't be honest with yourself or others.

  2. Give your trust to people that don't trust you.

  3. Know that you are not worth fighting for.

  4. Believe you can grow/help/save people.

  5. Believe the situation is only temporary.


Here is to hoping you play the game as well as we have.

February 19, 2007

Prep for Dorkbot #32

DSC00353

With David Fine on Sunday, getting ready to run the External Combustion Engine tonight at Dorkbot SF #32. We are trying to fix some leaky seals and update the software to finish up the aspects of the bot that we never finished for Roboexotica. This should allow us to serve more then one drink and not waste as much nitrogen when running the machine.

Details: 7:30pm Monday, 19 February 2007. At: Encounter Studios, 555 DeHaro Suite 120, San Francisco, CA. Free admission, donations requested. Main guests include Greg Leyh and Monochrom.

Continue reading "Prep for Dorkbot #32" »

February 18, 2007

SRL: Shop Days

DSC06995.jpg

DSC06998.jpg

Pulling the burned drive motor off the V1 rocket engine at the SRL shop.

The V1 damage occurred during the Aug 2006 San Jose show when the Inch Worm pinched the power lines going in to the drive motor. The parts are from a electric fork lift. The repair attempt, now that the burnt up motor is removed, will to be to fit in a motor and controller Mark had squirreled away. They don't quite match but are made by the same manufacture so we hope it can be made to fit.

SRL: San Jose Show and Machine Operation

Pictured: setup for the Survival Research Laboratories San Jose show (2006), operating the Running Machine in a field test before the show.

Unfortunately I only got to run the machine before the show. Violet was first at the controls and before I got a chance at them a conspiracy between the much maligned "Kimric Cart" and the Tesla coil killed the computer on this majestic machine. Not all bad; I got to actually watch the rest of the show and I am incredibly privileged to have gotten to operate such a wonder if only during practice.

* SRL San Jose show media page with images and video here; crew credits are here.

External Combustion Engine: Roboexotica

DSC07817

The first run for the External Combustion Engine at Roboexotica 2006 in Vienna, Austria was a success, and took home the award for category: Cocktail Mixing.

The Roboexotica program description read:

A very simple drink delivery system, controlled by RFID tags and which operates using pressurized liquids passing through fuel injectors to provide humans with liquid "fuel". When an RFID tag is placed down on the controller, the machine reads the tag to identify which drink to make and makes the drink associated with the particular tag by emitting bursts of liquid from the fuel injectors.

Vienna was a blast and the show was tons of fun. I made many liters of a drink that was equal parts dark rum, vodka, and dark berry juice. The drink looked like gasoline, was strong enough to serve in shots, and was enjoyed by those the drank it. We only made one drink, as I did not have time to rewrite the software. The code in the computer was just there to test that every thing was working and never meant to deliver a drink. You might notice that there were four injectors and only three ingredients. That was due to the fact the one just did not work after I unpacked the the robot from my luggage. I think in the end, that was rather fortuitous, as I really like that the one drink it made was so fuel-like.

* See all Flickr photos of the External Combustion Engine here.

* Watch the Roboexotica Geek Entertainment TV episode featuring the machine in action, and a full explanation of how it works, RFID, fuel injectors, and all.