For the past couple weeks, I’ve been using VMWare on Vista to load a saved Windows XP image in which I do some specialized work. I’ve been frustrated because every time I start the image, my CPU bounces to 100% – about 50% for the VMWare process and about 50% for the mysterious Mr. SVCHOST.EXE.
If I manually kill the SVCHOST process, my network goes away. One or two cycles of diagnose/repair usually fixes it, and then I’m good to go for the rest of the day.
I resolved this morning to get to the bottom of it, and I believe the culprit is the ReadyBoost service. After disabling that service and restarting the machine, no more SVCHOST bullshit when I start the VM. I don’t use ReadyBoost capabilities anyway – I have 4 GB of physical RAM, which is fine for me – so disabling it would appear to not pose any problems.
I’m working on a couple problems for which the AI technique of Evolution Strategies makes a perfect match. My own AI/EC background is mostly in genetic algorithms (Holland) and genetic programming (Koza); ESs were always “that German thing”. But ESs have gained in popularity as a general problem-solving paradigm, and for my current specific problem this approach is great.
Here’s the general form of the problem: Attempt to describe the mean and standard deviation of several weakly correlated outputs in the range [-1.0, 1.0], assuming you don’t know anything about the output values to begin with. Experimental results will (should?) tell you everything you need to know.
So, start with a uniform input distribution. Run the inputs through the model and calculate the outputs. Evolve the distribution of the next round of inputs based on the feedback. Wash, rinse, repeat. You may eventually get to a normal or pseudo-normal distribution if there is a single “correct” output.
ES allows you to coevolve the mutation function(s) (the mean and standard deviation of the inputs) as you go. The idea is that I can present more and more specific sets of inputs and arrive at very neat, very precise, measurements of the output parameter(s) using this method.
Representation is a little heavier than normal using this approach – roughly 3X – but as the man said, if you don’t particularly care about the results, you can get a program to run as fast as you want.
I’ve been re-reading some AI texts lately and have been reminded that being smart does not necessarily imply that you can write clearly. This is a nearly disqualifying habit in teaching texts in computer science.
Consider the following explanation of the edge crossover algorithm from “Introduction To Evolutionary Computation”, by A. E. Eiben and J. E. Smith. Highlights are my own.
Construct edge table
Pick an initial element at random and put it in the offspring
Set the variable current_element = entry
Remove all references to current_element from the table
Examine list for current_element
If there is a common edge, pick that to be the next element
Otherwise pick the entry in the list which itself has the shortest list
Ties are split at random
In the case of reaching an empty list, the other end of the offspring is examined for extension; otherwise a new element is chosen at random.
I’ve highlighted the problems in yellow. First, the terms entry and list are not defined prior to, or within, the algorithm. Second, we started out picking elements for the variable current_element; in 6(a) we are now picking edges? Doesn’t make sense.
You can make the reasonable argument that I’m being a touch obtuse – yes, you can ferret out the meaning if you look at it hard enough, but my point is that writers of teaching texts should demonstrate extra care in their explanations. I don’t want to have to pick apart unimportant details of algorithms – I would like to focus on the meaning or relationships or contexts around the algorithm under discussion.
Welcome to the New Era of Cloud Computing
Google, Fremont Campus 4/30/2008
6:30 PM to 9:00 PM
New technologies for large-scale data storage and processing are allowing companies to manage ever-increasing data set sizes. Scalable “cloud computing” technologies offer low-overhead ways to host your products, while ensuring that your computing base can adapt to changing needs.Open source tools such as Hadoop provide low-cost but high-powered platforms on which to develop your systems. This evening presentation, aimed at technology decision-makers of local high-tech corporations, will explain what you need to know to engineer reliable, scalable distributed systems to manage your data. The presentation will address the following topics:
What is changing about data availability today?
What is cloud computing, and in what form is it available to your company?
What systems has Google developed to manage large-scale data, and what makes them unique?
What open source systems can provide these benefits to you?
Speaker Aaron Kimball, Sr. Consultant, Spinnaker Labs, Inc.
Aaron Kimball is a leading authority on Hadoop-based system deployment. He provides advice, system development, and training to corporations and academic institutions worldwide. In 2007 he developed and taught a new undergraduate course in distributed computing with Hadoop at the University of Washington; this curriculum forms the basis for new courses being presented at top-tier universities across America and around the globe.
I’ve been browsing through the Google App Engine documentation and am violently resisting the urge to classify or judge the Python software development environment based on what I see here. It seems like a bit of a toy language, and comparing it to what I know best – the Microsoft development environment – it would compete with maybe 2001-2002 – late ASP, very early C#/.NET. The templating system in particular brings back very many bad memories of “classic” ASP in the pre-.NET days.
However, having said that, the “Hello World” examples for any language — even “Hello, world” with authentication, templating, database access, etc. – are all fairly simple by necessity.
I’d like to get my hands on some robust Python web code to see what I’m (most likely) missing. The only real previous exposure I have to Python was in reading the book Programming Collective Intelligence; and while those examples were neat, most of them could have been written in just about any language – i.e. just algorithms. As a practicing software developer, I need (or rather, want) a complete development environment that not only allows robustness, but in fact encourages it. I’ll use an analogy: when I’m walking, I don’t want to have to look down every step and make sure my shoelaces are still tied.
Right now I’m wondering what a tightly wound developer from, say, the Eiffel camp would have to say about Python if her only exposure to it was via these Google docs. Yowza!
This article was written by John M Willis, and has a very different perspective than most of the blinkered, navel-gazing Web 2.0-type blogs. Willis appears to have been around the block and gives a brief yet extremely informative explanation of the current state of “cloud computing”.
If you’re interested in putting the recent Google announcement into context, read this article.
First, a 1-hour video talk given last summer by Jeff Dean about Google’s overall distributed architecture, including Google File System, MapReduce, and BigTable:
Next, a website I found called highscalability.com which talks about a lot of these topics in a blog format. There’s an interesting summary of Google’s architecture with links here. Ironically, this site seems to be down/overloaded a lot.
Next, a whitepaper on BigTable. Lots of details for the inquiring mind, but still approachable for a software person who is not expert in distributed systems, or BigTable in particular. This was linked from the TechCrunch article.
Finally, there’s this separate 1-hour video, also with Jeff Dean, that was given in 2005 at the UW.
I haven’t actually watched this one yet, having opted to watch the 2007 one linked earlier.
Have fun! P.S. I would appreciate notes about other good BigTable orientation information.
Finally, Bigtable supports the execution of client-supplied scripts in the address spaces of the servers. The scripts are written in a language developed at Google for processing data called Sawzall . At the moment, our Sawzall-based API does not allow client scripts to write back into Bigtable, but it does allow various forms of data transformation, filtering based on arbitrary expressions, and summarization via a variety of operators.
Hmmm….this is interesting. Drop some data in to BigTable, tie it to a Sawzall script you’ve created — how to get the results back, if Sawzall can’t write _into_ BigTable? Have to figure that one out.
For a computationally intensive product like the one I’m developing, this is very attractive. And I don’t have to switch platforms like I would to get cloud processing done in Amazon’s EC2. I want to find out more about Sawzall.
I say “killer” only because if Google gets in, it’s going to be good. BigTable, an internal Google database product that they use to support their fast read/writes on petabytes of data (yes, peta-), is going to be released as a consumer offering in the same mode as Amazon’s SimpleDB. See the TechCrunch writeup here.
Good news for web startups? Certainly. Good news for Amazon? Probably, only insofar as a new industry – cloud computing – will support lots of competitors, and Google getting in only further validates the concept (as if it needed validating to begin with).
If Google can make it as easy to use as their other consumer offerings, like Maps, then we’re all in for a treat.