Mystery & Wonder

According to Andrew Sullivan, Alexis Madrigal claims that flocking behaviour is “… a beautiful phenomenon to behold. And neither biologists nor anyone else can yet explain how starlings seem to process information and act on it so quickly.”

That second sentence is just false, as even a quick visit to wikipedia is sufficient to discover: Current research shows that this vastly complex behaviour requires no interaction between all points, and no orchestration by some unseen hand.

Flocking behaviour can be simulated in computers by creating groups of simple bots, each of which responds independently to three simple rules:

1) Separation – avoid crowding neighbors (short range repulsion)

2) Alignment – steer towards average heading of neighbors

3) Cohesion – steer towards average position of neighbors (long range attraction)

Some researchers have even gone so far as to create real, flying drones that exhibit this behaviour.

The miracle is not that this grand ballet is so complex, but that it’s so damn simple in its essence.

Look, I marvel just as much as the next person when watching vast flocks of starlings. And there are few things more graceful and poignant than an entire school of sardines arcing over the waves in consecutive leaps as they flee from predators. There is little in life so exhilarating as being engulfed in a pocket of azure space as a school of reef fish flow soundlessly around you.

These are all examples of of simple creatures following simple rules, collectively iterating and permuting in patterns whose complexity the human mind finds attractive, even enthralling. Because it cannot follow the linear progression of individual acts in such a vastly parallel pattern, the brain hits the overload switch, which results in our sense of wonder.

It is, almost literally, mind candy. But that does NOT make it a mystery.

I’m not asking that we put aside our wonder, but can we please accept that many of these so-called mysteries are NOT mysterious. (Well, not any longer, anyway.) I’m as big a fan of exaltation as the next person, but I cringe when we allow it to curb our perceptions and our ability to learn.

Find Duplicate File Names in CouchDB

I was stumped for a bit, trying to figure out how to help my editorial staff avoid uploading the same file twice. In a repository spanning tens of thousands of titles in over a hundred different collections, our staff can’t easily tell whether a document is already in a collection or not.

Turns out that finding duplicate attachments is fairly easy. First create the view:

function(doc) {
  if (doc._attachments){
    for (var i in doc._attachments){
      emit([doc.collection, i], doc._id);
    }
  }
}

Which returns JSON output that looks like this:

[“collection name”, “filename.rtf”]

So all I have to do to find the duplicates is query that view using the composite key and see if it returns any rows:

http://my.couchdb.server:5984/database-name/_design/my-listings/_view/attachment-exists?key=[“collection name”,”filename.rtf”]

I could do the same with MD5 checksums, too, but I won’t. The problem is that even a single character change is enough to make two documents different. So if someone opens their copy of a file and Word changes the metadata in it, it’s no longer byte-for-byte identical, even though the text has not changed. This means that the number of false negatives (i.e. duplicate files that are NOT found) would be too high for people to rely on.

What I’d really like to find is an algorithm that determines whether the textual content of two documents is significantly similar….

On Pseudonymity

My friend Skud (yes, Skud) recently had her Google+ account suspended, apparently for not using her ‘real’ name. The section of Google’s privacy policy dealing with the issue of names says only this:

To help fight spam and prevent fake profiles, use the name your friends, family or co-workers usually call you. For example, if your full legal name is Charles Jones Jr. but you normally use Chuck Jones or Junior Jones, either of those would be acceptable.

Audrey Watters at ReadWriteWeb got a little further clarification from a Google spokesperson concerning Google Profiles and the use of real names:

“We are not requiring people to use their ‘real name’, but rather they need their Google profile to include the name they commonly go by in daily life. I know that sounds like the same thing, but there are some differences. For a hypothetical example, Samuel Clemens could choose to be known as ‘Mark Twain,’ although we wouldn’t allow him to go by Authordude88. And for a real life example, 50 Cent is using Google+, after we verified that this is the name he is commonly referred to. More details can be found here.

That page goes on to say that your name should use your first and last names, avoid ‘unusual’ characters (more about this below) and that your profile should represent only one person.

There are numerous problems with this policy which, taken together, make it impossible to implement it consistently or, indeed, objectively. Arguably, this policy would have disallowed some or all of the following:

Jesus Christ: ‘Christ’ is an title, not an actual name
Buddha: It’s really a title, and it’s only one word
Pol Pot, Lenin & Stalin: All noms de guerre, associated with illegal and subversive activities at some point in history.
The Apostle Paul: He was ‘really’ Saul
Socrates: What, no last name?
Ellery Queen: ‘He’ is actually a ‘they‘.
Acton, Currer and Ellis Bell: The Bronte sisters, who hid their identities (and location) to avoid scandal in their community
George Eliot and George Sand: Just a couple of the most notable women who could only be taken seriously after assuming a male identity

I could go on at great length, but suffice it to say that there are problems. You’ll note, by the way, that many of the names listed above refer to individuals who were guilty of subversive and often illegal activities. In many cases, too, there was a point in time where these names were not commonly known, or were disputed (even proscribed) by large segments of society, or by the powers that be.

Let me try to make these apparently silly examples clearer. It’s easy, with the benefit of hindsight to say, “Dude, that’s JESUS. Everybody knows he’s the Christ.” Well, that may be true now, but what about when he was some misfit wandering from town to town, pissing off a lot of Pharisees in the process? And yes, knowing what we know now, maybe we wouldn’t want to give a voice to Pol Pot, Lenin or Stalin. But how would we have felt about them in the early years of the 20th Century?

My question is: Are we on the side of the Pharisees, the Tsars and the Cambodian despots? Because that’s who we’re helping here, metaphorically speaking.

I’m not advocating taking a particular side. I’m suggesting exactly the opposite – not taking sides. That’s why I deliberately included some decidedly contentious figures in the list. (I could just as easily have included the authors of the Federalist Papers.) I just want to know that there’s room in our society for gadflies like Socrates, that it’s okay for some as-yet-unknown literary genius to speak freely and loud.

(And that, yes, even the soon-to-be villains can be captured in the public dialogue. There’s actually an argument to be made for listening to nuts like bin Laden and Breivik, in order that we better understand – and engage – our enemy.)

There are technical problems with any set of rules applying to names. As Patrick McKenzie eloquently demonstrates, just about any rule you think might apply to names actually doesn’t. Furthermore, the rationale that disallowing pseudonyms would have any effect whatsoever on spam and/or civility in public discourse, let alone that it will ‘help people know who they’re talking to,’ is entirely unproven.

But the issue is bigger than just technical. Skud writes that disallowing pseudonymity can be discriminatory and downright dangerous. The fact that her argument isn’t comprehensive makes it all the more compelling.

Throughout history, and for countless reasons, the use of pseudonyms and the appropriation of unofficial names are common, reputable and widely accepted practices,

One of the most common responses to these (and other) objections can be stated succinctly enough: Google’s Service – Google’s Rules. Fair enough, but let’s consider the implications of this. If we as a society allow ourselves to be utterly circumscribed by corporate policies over which we have no control (and which, as here, are pretty much arbitrary in nature), we’re in effect voting ourselves back into feudalism, where the rule of law becomes meaningless – or rather, indistinguishable from fiat.

I know some of you are writhing in your chairs right now, waiting to shout, “Oh come on, Crumb! Lighten up. This is a bloody social network we’re talking about, not some proletarian revolutionary struggle.” Well, no. This is a social network, and if it wants to reflect society then it needs to bloody well reflect it. In many parts of the world, just hanging out with your buddies on a service like this can get you into a lot of trouble.

Identity matters, for political, economical, social and philosophical reasons. The ability to define one’s identity freely is a fundamental human right. Google’s aim is to reduce bad behaviour, and that’s laudable. But if they want to do it right, they should focus on behaviour, not practices that are only tangentially linked to the problem.

If Google really wants their network to reflect society rather than deform it, they need to back off the name issue and look at fostering a culture of respect and civility instead.

Canonical is Failing

A word of advice to FOSS geeks:

If you must recommend Ubuntu Linux to others, recommend nothing later than 10.04, the last LTS release.

10.10 saw a number of minor but irritating bugs creep in that show a significant shortage of testing and forethought. There were countless small things like context menus no longer working after returning from a suspended state or new window positioning that’s completely counter-intuitive. Some of them, like changing sides for window buttons or listing indecipherable package descriptions above package names in Update Manager, were deliberate (and conceivably, in some universe, necessary), but most of the changes were clearly mistakes. When these are combined with long-standing bugs (like Network Manager arbitrarily deciding to disable the Save button) and inconsistencies, they begin to weigh against Ubuntu’s many virtues.

In 11.04, Unity, combined with an increase in the number of stupid bugs (that spiffy state-of-the-machine motd message is FUBAR’ed now on console login) clearly indicates that Ubuntu is more interested in new and shiny than they are in quality. A quick scan of Launchpad (itself a new product designed to simplify bug maintenance and supplant the competition, but which has done neither) shows that there are, on average, 100 open bugs per project.

Ubuntu is slipping out of control. Canonical have stopped listening and – more importantly – working with the community. The number of defects is growing, but Canonical’s response is to make it harder for mere mortals to submit bugs. They seem to think that strong guidance is needed for their product to grow in new and interesting ways. Fair enough, but they’re confusing leadership with control. They’re simply imposing their views because they don’t value the discussion. They’re treating criticism as opposition and shutting themselves off from valid feedback.

Worse, they simply don’t have the number of skilled developers they need to achieve their goals. When I look at the bug queues on some packages, I shudder in sympathy with the poor souls who are expected to wrangle them. Canonical is clearly embarked on an impossible task, but nobody’s either got the guts or the vision to spell this out to Shuttleworth and co.

Getting buy-in and active participation from the community is a pain in the arse at the best of times, but the alternative is far worse. Heaven knows that the GNOME dev camp are… special, to be nice. But it’s clear that, given the choice between getting a partial but workable success through compromise or taking their ball and going home, Canonical has consistently chosen the latter.

This cannot end well. It will, however, end sooner than later.

The Wealthy Programmer

In discussion today about programming for money – as opposed to programming for the love of it, or helping to change the shape of modern technology – someone made the following point:

I’d have thought striving to be independently wealthy would be an admirable goal – it’s a lot easier to be a philanthropist when you don’t have to worry about the roof over your head and where your next meal is coming from.

You’d have thought, but you’d have been wrong.

The pursuit and acquisition of wealth generally breeds greater stress and worry rather than less. Granted, there is a level of income below which one struggles constantly to manage even the most basic aspects of daily living.

Having lived on both sides of the divide, I can say with some assurance that living in poverty is debilitating, but so is significant wealth.

The one lesson of any value I’ve learned is that if you’re really serious about helping others (or helping make important things happen), you’re doing it already. Opportunities tend to look for people willing to accept them. You don’t have to be rich or powerful to achieve important things. Most of the time, you’ll find yourself pitted against the rich and powerful – at least you will if what you’re doing represents any sort of change. Even then, there are always influential allies to be found. Put in enough hours, demonstrate – no, prove – your abilities and Good Things do happen.

But here’s the catch. To do so is to accept uncertainty and risk as your constant companions. You are guaranteed to fail more than you succeed. Every victory, save a very choice few, will be temporary or mitigated by compromise. Your own needs and satisfaction will always take second place to those of others. You’ll find yourself – as I do – older, wiser, largely contented, but with very little to guarantee a contented, comfortable retirement.

All of this, of course, runs counter to the American myth of Success, where the sole measure of influence and importance is wealth. Rightly or wongly, it highlights people like Steve Jobs, Bill Gates and Mark Zuckerberg, relegating Knuth, Woz, Mohammed Younus and countless other more meritorious figures to the shadows. This is a distortion. It’s not false, but it’s fake.

In rare cases, wealth will accompany accomplishment, but that’s not always the case, and if you let the former stand for the latter, that’s all you’ll have. As a wise man once said to me, ‘If you go into the hills looking for gold, all you’ll find is gold.’

Why China Will Soon Dominate the World

Because nobody can stand in the way of their Superior Blur Ray Designde MP5 technology with capacities Up To 1 Tera Gig!!!

A Novel in Three Links

This + this + this = an opportunity to change the way we communicate, and to change history as well.

The freedom that we experienced on the Internet of the ’90s is waning. Governments and commercial interests take ever-increasing steps to circumscribe people’s ability to communicate digitally. The only way to change this tide from ebb to flood is to fulfill a promise that was first made in the ’90s.

We need to disintermediate the network. It’s an ugly duckling of a word, but cutting out the middle man matters more now than ever.

As long as the cables, wires and frequencies over which we communicate are susceptible to being controlled, curtailed or even disconnected when the things we say -or the way we say them- become upsetting, we will find ourselves increasingly confined.

As I said during an Internet policy session yesterday, if you ask anyone –anyone– whether there should be limits on Behaviour X on the Internet, the answer will always be a resounding Yes. That’s not a problem in and of itself, because X is usually anti-social and contrary to the public good. The problem is that anything capable of curtailing Behaviour X can be brought to bear on Behaviours A through W as well.

The only way out of this is to provide the technical means to do what we have always done in democratic societies: Keep our private discussions private and our public discussions free.

For the former we at last have all the ingredients we need:

Gigabit wifi – We can finally start thinking about getting decent performance out of wireless data transmission, meaning that we can worry a little less about putting a lot of people onto a single wifi network;
Wireless Mesh Networks – Enough with the telcos; we can now start looking at creating ad hoc, self-organising networks, relegating the role of the data carriers to one similar to power and water utilities;
Secure Voice Communications – Security expert Moxie Marlinspike (yeah) and a crew of like-minded individuals have floated a very useful service recently, allowing secure VOIP and SMS communications between phones. By building encryption into the bones of the app, they’ve created software that looks and acts exactly like normal calling and texting. The only difference being that, if the other person is using their RedPhone service, the entire communication remains a secret shared only by the two of you.

The idea behind these things have been floating around for some time (the protocol underlying RedPhone has been with us since 2006), but now they’re all here in usable form.

I’ve said it before: The story of freedom of Internet freedom and online privacy will be the defining social conflict of our generation. As the peoples of the Middle East are discovering, the narrative of freedom is suspenseful, dramatic and exciting in the best and worst ways.

Whoever manages to blend these three technologies together seamlessly and easily enough for anyone to use them will assuredly be one of the main protagonists in this unfolding drama. They may not garner the celebrity of a Jobs or a Gates, but they will have the impact of a Gandhi or a King.

The Internet ≠ the Network

Douglas Rushkoff just posted a piece with which I largely agree, but which indulges in some remarkably lazy language in the process:

“Some of us might like to believe that the genie is out of the bottle and that we all have access to an unstoppable decentralized network. In reality, the internet is entirely controlled by central authorities.”

Arrgh! This kind of thing drives me crazy. If we could stop conflating the Internet (which is a combination of networking protocols) and the physical network (which is a bunch of cables and antennas and switches), we might be able to have a useful dialogue about how to reduce the Internet’s vulnerability to coercive measures by changing the shape of the network.

In the end, that’s what Rushkoff advocates; I just wish he wouldn’t muddy the water so.

Stay with me, kids; I’m going to say this again slowly: The network is the wires and antennas and stuff. The Internet is the way information is organised to travel across it.

More to the point, the Internet is a very specific way for data to travel across it:

It doesn’t rely on a middle-man. I might choose to use Facebook for chat, but I don’t have to. I could connect straight to your computer or phone and chat away.
It doesn’t need a road map. In effect, the data packets just go hitch-hiking across the network with a sign saying ‘San José’ – or whatever.
It doesn’t see borders the same way some other network protocols do. In fact, that’s why it’s an Inter net: Because it routes traffic between different networks.

Once more:

Internet = you & me talking.
Network = the road system that allows you and me to get together to talk.

There. That wasn’t so hard, was it?

Oh, as long as I’m being pedantic: It’s Internet-with-a-capital-I. It’s a proper noun referring to a very specific thing. It’s like a country with all the geography taken out. It still has to have a capital.

Infowar – A Case Study

[This weekend’s Opinion column in the Daily Post]

The recent decision by the Mubarak regime in Egypt to cut off all Internet access for its citizens is a textbook example of using a silver bullet to shoot oneself in the foot.

The whys and wherefores of how they’ve gone about doing so provide a useful opportunity to understand the paradox of control over the Internet and the costs involved when governments and other actors indulge their desire to dam the torrent of information that flows across their networks.

In order to do that, we need to dispel a rather pesky myth.

Perhaps the most dangerous misconception of the Internet is its survivability. It’s true that, as one information activist put it, the Internet treats censorship as damage and routes around it. But that statement is predicated on the actual presence of an Internet in the first place.

That may sound like a silly statement, but the Internet might not be as enduring as many assume it to be.

While many of the software and communications protocols that define the Internet are, by design, remarkably resistant to outside control, the physical networks through which our data passes are not nearly so robust.

James Cowie, a network analyst from Renesys Corporation, has written excellent analyses of state intervention in national communications both during the post-election strife in Iran and more recently in Egypt. Using forensic evidence gathered in real time, he constructs a vivid scenario: In contrast to Iranian authorities, who elected to use physical choke-points in the communications infrastructure to reduce the flow of information to a trickle, Egyptian authorities appear to have instructed all national Internet Service Providers simply to cut all communications with the outside world.

Starting at midnight (Egyptian time) on the 27th of January 2011, Egypt’s largest ISPs began disappearing from the Internet. Within a period of about 13 minutes, they simply stopped delivering data to and from their customers.

Cowie writes:

“[T]his sequencing looks like people getting phone calls, one at a time, telling them to take themselves off the air. Not an automated system that takes all providers down at once; instead, the incumbent leads and other providers follow meekly one by one until Egypt is silenced.”

How did this happen? Every large ISP participates in a cooperative system called the Border Gateway Protocol, or BGP. BGP allows them to discover how traffic destined to a remote network should be directed. Simply put, each ISP announces which address blocks it supports. These blocks can represent tens or even hundreds of thousands of individual machine addresses.

Designed for simpler times, BGP is a trust-based protocol. It relies implicitly of the good faith of all participants to continue working. This makes it remarkably vulnerable to the machinations of states or organisations whose interests don’t align with others’. Back in 2008, Pakistan Telecom caused a furore when, for a little over 2 hours, their bungled attempt to use BGP to block YouTube domestically resulted in the site disappearing from much of the Internet.

Just last year, a change to BGP traffic announcements resulted in about 15% of all Internet traffic being routed through networks in China for a brief period. This resulted in breathless speculation that the disruption was not accidental. Some claimed that it amounted to a reconnaissance in force, as it were, a probing of the global Internet to determine its resilience in the face of attack.

Intentional or not, these disruptions to the BGP apparatus make it abundantly clear that choke points exist on the Internet and that they are remarkably easy to subvert.

Debate continues to rage in technical circles about what can be done to mitigate BGP’s innate deficiencies. Changes will doubtless be necessary. But the liability wouldn’t be so grave if our physical communications networks weren’t so hopelessly centralised.

Egypt offers us a particularly vivid example of this. A country of over 80 million people, it has only a half a dozen or so large Internet providers. Only one of them, the Noor Group, initially resisted the demand to drop services. Some have speculated that its continued online presence was due to its extensive list of blue chip clients, including many banks and the Egyptian Stock Exchange.

Ultimately, though, it was a limited victory. Noor advertised only 83 of the roughly 3500 data routes in and out of Egypt. They were eventually forced off the air a week after their IT confrères.

In Iran, population 72 million, there are only 5 significant international links, all of which flow through a single Government-run office. Such centralisation makes it easy for the state to exert its influence.

(One European-owned company, Vodaphone, washed its hands of the decision to cut service to its Egyptian customers, claiming that the Mubarak regime had the legal right to issue the order. This rhetorical line apes the rationale provided by Nokia-Siemens when it was discovered that their equipment enabled Iranian authorities to block most traffic and eavesdrop on the rest.)

The Internet as a principle –that is, the idea of an open network allowing free communication regardless of source or sender– is not as popular as some might believe. It made its way into the commercial world more by stealth than by deliberation. Telcos didn’t really understand the Internet as a service; they just knew they had to offer it in order to compete.

One thing was clear to them: The sum of all services across a global network was clearly more valuable than those offered by a single provider. Equally attractive was the perception that these services came more or less for free with the connection.

But the seductive power of the Net hasn’t changed attitudes entirely.

Telecommunications companies, with a long legacy of market-controlling behaviour, still build and deploy their infrastructure using centralised models. Recently, some of them have begun lobbying for the right to exert control over the data that passes over their networks, potentially penalising services that compete with their own. Comcast, one of the largest ISPs in the US, recently got approval to acquire NBC Universal and its content-creation ecosystem, giving rise to fears that they might leverage their control over the information pipeline to dictate what passes through it.

Put simply, carriers would love nothing better than to go back to the telephone service model, where fees are based on where you are and who you talk to, with no conversation possible unless you’ve paid your toll.

The principle of an end-to-end network –that is, one that allows direct, unmediated connections between two parties– militates strongly in the opposite direction. Its appeal is remarkably seductive, leading most Internet users to view with displeasure the telcos’ (or governments’) desire to mediate communications.

Renesys quite rightly remarks that if cuts to Egypt’s Internet had lasted much longer, the reduction in commercial activity could have been catastrophic for the nation.

Furthermore, Cowie remarks, it wasn’t only Egypt’s pipelines that were at risk:

“[T]he majority of Internet connectivity between Europe and Asia actually passes through Egypt. The Gulf States, in particular, depend critically on the Egyptian fiber-optic corridor for their connectivity to world markets.

“Are the folks at Davos thinking about this? They should be.”

In a perfect world, consumer choice and basic business commonsense would always win. But the problem is that centralised networks not only cost a lot of money (placing their design and construction into the hands of the most powerful), they make a lot of money, too.

In monetary and political terms, the wealth of the network itself tends to pool rather than to flow.

A fundamental change has already overtaken the public’s perception about the value and nature of digital communications. Passive consumption of news through the television is considered passé, or at least diminished in relation to the sharing of photos, videos and words across the Internet.

As individual control over the flow of information rises, central control wanes. And this, obviously, is the crux of the dilemma facing businesses and governments across North Africa and throughout the world. They are belatedly coming to realise that they are fighting a many-headed hydra. As they cut off one avenue of communication, another rears its head.

But that hydra has a body, and the body is the network itself.

As this column goes to press, it appears that Egypt’s decision to cut off the Internet failed in every important regard. One protester is reported to have said, “F*** the internet! I have not seen it since Thursday and I am not missing it.… Go tell Mubarak that the people’s revolution does not need his damn internet!”

I would be amazed, however, if this fact led other governments to act differently, should they find themselves in a similar situation. Indeed, the US Congress is currently considering legislation that would provide the President with an ‘Internet Kill Switch’ for use in case of emergency.

Likewise, I see no evidence that the ultimate futility of attempting to control the flow of information will change attitudes in the board rooms and offices where our increasingly centralised networks are planned. For telcos, the challenge is merely technical.

For the Internet –as it was originally intended– to become fully realised and fully resistant to coercion, the devices and infrastructure through which our data travels will need to reflect the same principle of decentralisation as the software and protocols we use today. That implies the construction of communications devices that are very different from the locked-in, network-centric phones, tablets and computers we’re familiar with. I can think of no short-term scenario in which the development of such products will take place in any significant way.

For some time to come, we will continue to live in a world in which the powerful continue to load silver bullets and take aim squarely at their own feet.

THE SCRIPTORVM

Collected essays, columns, etc. by Dan McGarry

Menu

Category Archives: geek

Mystery & Wonder

Find Duplicate File Names in CouchDB

Canonical is Failing

Why China Will Soon Dominate the World

The Internet ≠ the Network