Contact Info

(for those who care)

Instant Gratification   



Sun, 18 Dec 2011

Google App-Engine “Doesn’t” Scale

I agree for the most part with your final conclusion: “no system that is likely to become productionized at scale should be written on App Engine.”

This is a sad conclusion to arrive at after all these years, especially when the original promise of App Engine was (essentially) “write your applications against our strange, quirky API, and they’ll scale far more cheaply and reliably than they could otherwise.”

[source…]

I’m going to have to add fuel to the fire here, 100% agreement.

GAE is “irreplaceable” in the sense that while the API has been mostly duplicated, the ability to scale that API to arbitrarily large workloads has not.

There was a blog article a long time ago that talked about designing a future language and came to the conclusion that perhaps even performance characteristics should be specified. For example, a language where sort might have n^2 vs. n log n performance characteristics will work fine for basic workloads but will wreak havoc when used at higher capacities. Specifically, after a certain point, performance expectations become part of the implementaiton.

So you have GAE, which has incredible “Superman” powers, faster than a speeding bullet, more powerful than a locomotive, etc. and irreplaceable for automatically scaling to arbitrarily large workloads. And therein lies the rub, as described in the linked article.

The one feature that GAE gives you, the reason you’re sacrificing everything to work in GAE’s world is the one feature you can’t count on. You can’t count on it due ENTIRELY to the mismanagement by Google of their GAE developer community. E-N-T-I-R-E-L-Y. It is a good thing Netflix made their pricing and customer-rlation screwups after Google did otherwise I would say they had “Netflix-level” poor planning and communication.

Actually there are two reasons. One is that GAE cannot be replaced (there is no alternate GAE provider that can scale to some arbitrary 1TB workload) and two is that GAE so poorly mismanaged the “transition to non-beta” product for GAE.

A few years ago I sung the praises of Google’s strategy with GAE and Adwords. I would tell people: “Where is autos.google.com?” And autos.google.com is actually google.com/search?q=autos because every auto forum is running Adwords against their content. And then they launched GAE and my expectation was that a lot of the autos websites (mysweetcorvette.com) would transition to using a GAE-based forum and google would make yet-even-more-money hand over fist.

They give you the tools to get started capturing users and generating content, the tools to monitize it, and the more users / monetizing you get you bump from the GAE free tier to the GAE pay tier. Of the $100/mo you get from Google Adwords, you start paying $1/mo to Google and earning $102 from Adwords. Then $10/mo to Google and earning $120 from Adwords as your userbase continues to grow.

But alas, this wasn’t meant to be. Google has a unique, challenging, irreplaceable developer product to use and the one thing it excels at (scaling) is the one thing that is impossible to trust.

At this point, I imagine that Amazon’s EC2 + “Individual Dumb Services” is far better compared to Google’s style (which doesn’t bode well for future Google acquisitions……). S3 is too expensive? Move to a different “I serve files from big hard-drives” provider. SimpleEmail not doing it for you? Get on board with a different “pay-to-spam” provider. EC2 doesn’t match your needs? Find a different cloud provider and have a minimal implementation up and running on them.

Amazon, by designing their services around commodity components that encourage competition and API duplication has paradoxically made their service more architecturally robust (in the sense that a customer has tons of options for migration, price, and performance competition). Downtime(s) notwithstanding, it looks like the original blog author has hit the nail on the head.

EC2-style implementations require a bit more work up front but leave you in a more price-competitive situation if your product takes off and gets a lot of traffic. And your product won’t. 99.9% of the time you’re gonna be fine with a single box or a 4-box setup (2 frontend, 2 database). And if you do reach the giddy heights of needing GAE’s scalabilty, you have to begin each SEC filing with: “Assuming Google doesn’t raise prices for app-engine by ∞% like they did last year …”

Enjoy the Ferrari, Google and I guess everybody else will stick with their daily drivers.

11:57 CST | category / entries
permanent link | comments?

Mon, 21 Nov 2011

just made a vim macro to insert “carp ##” statements every five lines. #FML … Think I should removing the “use common::sense;” line?

16:48 CST | category / entries / tweets
permanent link | comments?

Sun, 20 Nov 2011

Good interview question: algorithm for minimum unique length given many md5. Bonus: given three points, which angle is closest to 90 degrees

18:53 CST | category / entries / tweets
permanent link | comments?

Mon, 14 Nov 2011

Space.

Earth | Time Lapse View from Space, Fly Over | NASA, ISS from Michael König.

That is all.

11:30 CST | category / entries / links
permanent link | comments?

Tue, 08 Nov 2011

Wealth and Wants

Wealth is not the same thing as money. If you want to create wealth, it will help to understand what it is.

Wealth is the fundamental thing. Wealth is stuff we want: food, clothes, houses, cars, gadgets, travel to interesting places, and so on. There is not a fixed amount of wealth in the world. You can make more wealth.

People think that what a business does is make money. But money is just the intermediate stage— just a shorthand— for whatever people want. What most businesses really do is make wealth. They do something people want. Money is a comparatively recent invention.

A surprising number of people retain from childhood the idea that there is a fixed amount of wealth in the world. There is, in any normal family, a fixed amount of money at any moment. But that’s not the same thing.

Suppose you own a beat-up old car. Instead of sitting on your butt next summer, you could spend the time restoring your car to pristine condition. In doing so you create wealth. The world is— and you specifically are— one pristine old car richer. And not just in some metaphorical way. If you sell your car, you’ll get more for it.

In restoring your old car you have made yourself richer. You haven’t made anyone else poorer. So there is obviously not a fixed pie. And in fact, when you look at it this way, you wonder why anyone would think there was.

[source…]

I took the liberty of de-Paul-Graham-ifying a bit of this to make some of the central points more clear.

10:55 CST | category / entries / links
permanent link | comments?

Thu, 20 Oct 2011

“If you would build a ship don’t drum up men to gather wood, divide work and give orders. Instead, teach them to yearn for the endless sea.”

12:43 CST | category / entries / tweets
permanent link | comments?

Wed, 19 Oct 2011

Programming Language Roundup

I guess it’s on everybody’s mind lately, but here is an excellent overview of “new” languages to come out in the past few years. Definitely a lot more than I thought there were at first.

In this post I will provide a list of fairly new languages (let’s say 5 years with a little flex) that display interesting features and display higher-order thinking in the way that they tend toward an evolution of past learnings in programming language thinking.

[source…]

The one that sticks out to me is Shen which has a few very interesting features.

Taking a look at each point in turn, once you get past the parenthesis, lisp syntax is very cool. If you try to do a good job of following functional / immutable / transormative programming you end up changing the way you write code.

// procedural style
var abc = 6;
double( abc );
triple( abc );
assert( abc == 36 );

// functional style
var abc = 6;
var def = double( abc );
var result = triple( def );
assert( 36 == result );

// "clean" functional style
var result = triple(
                 double( 6 )
             );
assert( 36 == result );

Obviously the above is a contrived example (to be even “cleaner” you would do away with the temporary “result” variable), but you can see that as you get “more functional” you naturally tend to collect parenthesis, no matter the language you are working in.

The “pattern matching” aspect that I really liked from ML is pretty much syntactic sugar for if / else or case / switch but built in to the language.

I find that I’m often writing code like this:

// old style
function foo( a, b ) {
    if ( 1 == a ) {
        alert( 'foo ' + b );
    }
    else if ( 2 == a ) {
        alert( 'bar ' + b );
    }
    else {
        alert( "unknown action: " + a );
    }
}

// new style
function foo( a, b ) {
    var mapping = {
        1: function( b ) { 
              alert( 'foo ' + b );
           },
        2: function( b ) { 
              alert( 'bar ' + b );
           },
    }

    if ( mapping.hasOwnProperty( a ) ) {
        mapping[a]( b );
    }
    else {
        alert( "unknown action: " + a );
    }
}

// "better" new style
function foo( a, b ) {
    var mapping = {
        1: doAlertWithFoo_ref,
        2: doAlertWithBar_ref
    }

    if ( mapping.hasOwnProperty( a ) ) {
        mapping[a]( b );
    }
    else {
        alert( "unknown action: " + a );
    }
}

// "hypothetical" pattern-matching style
function foo( 1, b ) {
    alert( 'foo ' + b );
}
function foo( 2, b ) {
    alert( 'bar ' + b );
}
function foo( a, b ) {
    alert( "unknown action: " + a );
}

Again, an extremely contrived example, the point being that you treat conditionals as mappings instead of bare if/else clauses. Then, the only bugs you can have are “I am missing a mapping” or “My mapping is criss-crossed” or worst “My target action is buggy”.

The pattern-matching aspect of a language makes it easier (supposedly!) to reason about functions. Instead of having to read inside a function to determine how it behaves (for many if/else cases), you can promote that to the top level of function definition (in the above, read function foo( 1, b ) as: when foo is called with 1 as the first parameter).

The coup de grâce for me though is the inclusion of a version of Prolog core within the language. In the same way that regexes make string processing a million times easier, prolog makes logic processing a million times easier. And who would have thought… but logic processing turns out to be a useful thing to optimize from time to time.

You’ll have to bear with me on this one because it’s a little harder to explain. But let’s say we have a computer.

 def isComputer( x ):
    return x has memory and x has cpu and x has hard_drive

 def 4gig_o_ram is memory
 def 2gig_o_ram is memory
 def 1gig_o_ram is memory

 def amd_chip is cpu
 def intel is cpu

 def seagate is hard_drive
 def western_digital is hard_drive

 my_computer = 4gig_o_ram and intel and western_digital

 def canRunWindowsVista( x ):
     isComputer( x ) and has 4gig_o_ram

 canRunWindowsVista( my_computer )?
 => true

Again, a really contrived, terrible example with incorrect syntax. But the idea is that you set out the facts and you let the computer (90% of the time) figure out all the boring crap of making sure your arbitrary constraints aren’t violated.

You run into this problem All. The. Time. when working with config files, validating function parameters, determining if a user has permission to perform an action, etc, etc, etc. Regexes are very information dense around a well-defined problem. “Prolog” is a bit more verbose around problems that you define yourself, which is my theory as to why there has never been a “like-a-regex-but-for-prolog” included in any languages.

I can tell you it excites me to see this Shen project try. Definitely worth paying attention to.

00:52 CST | category / entries
permanent link | comments?

Tue, 18 Oct 2011

Interesting #bash tip instead of sed: … | while read ; do echo $REPLY ; done http://stackoverflow.com/questions/1670577/indenting-bash-script-output/1670632#1670632

14:27 CST | category / entries / tweets
permanent link | comments?

Mon, 10 Oct 2011

On “Go” and “Dart” - Google Wants You

Inspired by Mr. KrestenKrab, I feel obligated to comment on Google’s language development activities.

Posit. They’ve released “Go” and “Dart” which are improvements to the “C-style” and “Javascript” languages respectively.

Posit. They’ve released GWT which is “Javascript for Java-devs” and they’ve mucked with “fixing” both Python and Java to run in a sandbox / secure environment.

Oh, and I almost forgot they’ve implemented a Javascript engine in V8.

Publicly they’ve talked about their internal tooling capabilities when working with source code (bytecode inspection, singleton detection, the ability to focus unit tests on potentially impacted code based on what lines were changed in a file / diff).

Unfortunately I don’t have all the references handy, but if you pay close attention to what google is doing, you begin to see that they are most definitely on the “other side” of the bell curve from most people in their ability to navel-gaze at their product code.

There are a lot of people who are slamming google and I can’t tell if it’s because they think it’s the cool thing to do, or they don’t understand what they’re doing, or maybe it’s because they are afraid of change?

I can’t tell what it is, but from my perspective, Google’s moves are interesting and fairly transparent.

Obviously they’ve written a lot of code and have a bunch of engineering talent. They’ve also spent a lot of time “cleaning up” other languages / language implementations (javascript-v8 / java-gae / python-gae). They also appear to have taken a hard look at their own code and found areas where C/C++ makes things difficult, as well as that red-headed stepchild: Javascript.

You might also have noticed that each of Go and Dart have “shipped” with online / live compilers exposed to the internet, pretty much from day one. That speaks volumes to their confidence in their ability to lock down the code for use with untrusted input.

Go and Dart both appear to be fairly pragmatic languages, focused on making it easier to write correct programs.

From the Go FAQ:

Go is an attempt to combine the ease of programming of an interpreted, dynamically typed language with the efficiency and safety of a statically typed, compiled language. It also aims to be modern, with support for networked and multicore computing. Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. These cannot be addressed well by libraries or tools; a new language was called for.

[source…]

From the Dart FAQ:

Create a structured yet flexible programming language for the web.

Make Dart feel familiar and natural to programmers and thus easy to learn.

[source…]

…and from their leaked document, you’ll also hear that Dart was designed to allow for more complete “tooling” (right-click => refactor) as a design goal compared to the 4-5 “basic” ways that javascript supports classes (bare objects, constructor functions, prototype inheritence, module pattern, etc, etc.).

So… why have Go and Dart “leaked” from the lab into the real world?

This is the interesting part of the Google story. In my day job, I regularly rail on people for inventing dumb libraries, NIH and the like. Right now I’m using a unit test library for perl that is in use nowhere else outside of my company.

The question I always ask: “Can I buy a book on it?” Followed immediately by: “Why is this different than what the blogs are telling me to do?” and “How long would it take somebody to ramp up if I hired them off the street?”

Sometimes there is a good reason, but more often there aren’t good answers to those questions.

So Google’s a smart company, and I’m a smart guy. That means that Google has some of those same concerns. Even if the new languages that Google has put together are materially better than the other languages out there, it still doesn’t help Google until outside people start using them.

They’ve already pushed Guice, GWT, their Java / C development standards, their testing blog, map reduce, everythingmany things they’ve learned internally they’re trying to share with the outside world so that it isn’t quite as big of a shock when fresh meat enters the grind.

If you look at how the existing programming languages were made, they each grew on a strong base (C++ on C. Java on C++. PHP on Perl [heh]. Python on Basic.) and tried to make incremental improvements yet be fundamentally “better” taking into account the shared experience that only real-world use can give.

I think that Google is following in the same tradition, and given their internal experiences, only a fool would dismiss it out of hand.

It was (kindof?) a stroke of genius when Yahoo! ditched their internal template system for PHP. I’m sure it had a profoundly positive impact on their hiring productivity “Oh, you know the most popular web scripting language? Good, that’s the one we use.”

In the same way, if we assume that Go and Dart are in fact materially improved languages (especially if they are materially better inside of Google), it is in Google’s vested interest to get as many people as possible using it. The more someone knows before joining a company, the lower their training costs are, the higher their productivity.

Their focus with these languages (at least with Go) appear to be laser-like on reliability, ease of programmer use, and ease of inspection / whole program manipulation.

I think that the leaked memo for Dart forced Google’s hand to get something out there quicker than they really wanted. I also think that looking at Dart briefly it would benefit from similar “bold changes while in flight” as they have made to Go based on community feedback (the basic one for Dart they’ve screwed up is not tagging functions with a simple keyword- “def / sub / func / function” so you can grep for things).

Go has been out for approximately two years (November 2009) and they are closing in on their own “Go 1.0” release and have made significant changes to the language during that process.

It will probably be a while before Dart reaches anywhere near the same relevance as “Go” has (which is to say: not that much outside of Google), but I have high hopes, especially if they steward the project well.

Based on the evidence of Go’s evolution and apparent benefits over C, I would hope that Dart follows a similar path. For Google, I assume it will see a lot of use (and quickly!) because they already support cross compiling to Javascript (GWT) and already have a cross-compiler from Dart to JS.

It is not surprising that Google has released these languages. As a matter of fact, they are just following with the tradition of corporations releasing languages they’ve invented and found useful (Erlang=>Ericcson, C=>Bell Labs, Java=>Sun, lots more programming language origins) but what is surprsing is that there are now so few Universities involved in advancing the state of the art in programming.

I wonder if we are entering a “new era” of Computer Science where pragmatism trumps expressing algorithms, or if they have somehow (necessarily) merged into one but now with corporations driving the bus.

21:14 CST | category / entries
permanent link | comments?

Wed, 05 Oct 2011

Did you know that translate.google.com does an excellent job, especially if you “help it” by looking up its alternate translations? #boggled

10:03 CST | category / entries / tweets
permanent link | comments?

Thu, 22 Sep 2011

16:28 CST | category / entries / tweets
permanent link | comments?

Thu, 08 Sep 2011

Tweet Love(tm) for DonutGames.com. Free this month on the iPhone: Rat-on-a-Scooter and Lucky Coin are good and dude writes solid code

20:26 CST | category / entries / tweets
permanent link | comments?

Fri, 02 Sep 2011

Level-headed intro to Agile / Scrum

This is probably the most level-headed introduction to scrum / agile processes I’ve ever read.

I have come to think of estimation and time tracking as being similar to code coverage. People who have never really tried using code coverage tools are quick to point out all the reasons why code coverage is a waste of time. They can describe in great deal why code coverage will not solve all their problems. They know that even 100% code coverage does not guarantee that software is correct. All of their excuses are built on facts. But it is also difficult to find someone who has used code coverage who believes it is pointless.

It is true that best practices can be taken too far. Many of them produce their greatest returns when applied in a small way. Nobody’s going to be hiring me as an agile coach, but I’ve got more experience than I did two years ago, and I’ve learned some great lessons. Nowadays, when somebody tries to tell me that estimation and time tracking are pointless, I ask them, “Have you tried these practices with 4-week sprints and a burndown chart?”

[read more]

He approaches the topic as a skeptical skilled-practicioner and it seems their team was constantly adjusting the “final target” aim for their software release.

There was a period of several months where every time we finished a sprint we had to insert a new one to deal with the stuff that didn’t get done. Ian described sprint 15 as the first sprint where he didn’t have to redo the 1.0 plan.

This is in pretty stark contrast to most sprint-based projects I’ve worked on which have been more of “small batches, updating production” and no final marketing or release push for any of the “bigger” things that might shift / delay from sprint to sprint.

In any case, if you are an “agile skeptic”, please read the linked article, it’ll give you some healthy food for thought.

12:29 CST | category / entries / links
permanent link | comments?

Mon, 15 Aug 2011

Howard’s Tea Cakes

Preheat oven to 350 degrees

In a large bowl sift flour, baking soda, and baking powder together.

Add remaining ingredients and blend well. Dough should be soft.

Roll dough out onto a floured surface until approximately 1/4-inch thick.

Cut dough into desired shapes and bake on a slightly greased sheet for 10 to 12 minutes.

Makes 6 to 8 dozen.

16:41 CST | category / entries / recipes
permanent link | comments?

Fri, 12 Aug 2011

Sick! Search for “ county, ” on the googles and get a county map. http://www.google.com/search?q=spring+county%2C+tx

11:09 CST | category / entries / tweets
permanent link | comments?

Thu, 04 Aug 2011

Interest on “Technical Debt” increases when that debt is transferred. Tech Debt to the original developer might even feel minimal. #agile

13:05 CST | category / entries / tweets
permanent link | comments?

Sat, 30 Jul 2011

Cheddar Biscuits

Crude form of bookmarking because these look prety good.

Mix until sticky.

Bake at 375° for 10-12 minutes.

After baking, brush with:

(make while biscuits are baking)

23:08 CST | category / entries / recipes
permanent link | comments?

Fri, 29 Jul 2011

and :help reg is my new favorite vim-ism. Somehow ia is much more comfortable than “ap #vim

16:13 CST | category / entries / tweets
permanent link | comments?

Mon, 25 Jul 2011

Great article on differences between English and Spanish. I can back most of them up with personal experience. http://esl.fis.edu/grammar/langdiff/spanish.htm

11:16 CST | category / entries / tweets
permanent link | comments?

Thu, 21 Jul 2011

if ( $you_want_to_go_nuts ) { try_to_parse_dates_with_timezones() && in_perl(); }

16:09 CST | category / entries / tweets
permanent link | comments?

Tue, 12 Jul 2011

This is the coolest thing I’ve seen in a long time. I am a sucker for disruptive “poor-tech” that works this well. http://uk.reuters.com/video/2011/07/11/bringing-light-to-the-poor-one-liter-at?videoId=216968892&videoChannel=82

17:21 CST | category / entries / tweets
permanent link | comments?

Fri, 01 Jul 2011

Google went to Vegas and bet it all on Red + Black.

14:55 CST | category / entries / tweets
permanent link | comments?

Wed, 29 Jun 2011

Having to use :Gblame a depressing number of times today. Really? That’s how you chose to do that? #fugitive #vim #git

15:55 CST | category / entries / tweets
permanent link | comments?

Wed, 22 Jun 2011

Just sent a friendly mail to ESR, will see if he responds. What’s the difference between him and RMS? One likes guns, the other likes GNUs.

11:40 CST | category / entries / tweets
permanent link | comments?

Mon, 20 Jun 2011

This blog-post on git branches is full of win. Thank you Mr. Longair: http://longair.net/blog/2009/04/16/git-fetch-and-merge/

20:28 CST | category / entries / tweets
permanent link | comments?

Thu, 26 May 2011

Public Service Announcement: Airport Extreme firmware 7.5.2 is f***d. Downgrade to 7.4.2 if it dies after pushing your changes / update.

20:46 CST | category / entries / tweets
permanent link | comments?

Thu, 19 May 2011

Git Diff “Hunk Headers”

Cool throwaway feature of git diff’s. “Patch Hunk Headers”.

Basically any diff will regex backwards (depending on detected target language) and put the function or “something sensible” in front of the diff output. That way you can (likely) see if you are putting a print statement in “function foo()” or “function bar()” before committing.

[source…]

Please use responsibly.

00:01 CST | category / entries
permanent link | comments?

Mon, 16 May 2011

Software as a Tool

Very rarely do programs reach the level of tool, although I wish more would aspire to.

Do nothing more nor nothing less than I ask.

Do it well and without complaint.

Above all, be consistent.

…and very few others.

Coincidentally, most of these tools have a high learning curve. But perhaps not coincidentally, they are so powerful and consistent that the initial setup cost is dwarfed by their general utility.

12:12 CST | category / entries
permanent link | comments?

Tue, 10 May 2011

“In order to become an official Google product we must restructure our pricing model to obtain sustainable revenue.” http://googleappengine.blogspot.com/2011/05/year-ahead-for-google-app-engine.html

16:24 CST | category / entries / tweets
permanent link | comments?

Tue, 03 May 2011

Dear East Coast, I am neither a “yuman” nor a “yewman”. Thank you for your attention. Sincerely, Robert.

11:43 CST | category / entries / tweets
permanent link | comments?

Wed, 27 Apr 2011

YAML is bullshit “foremost design goal: human readability and serializing arbitrary native data structures” &SS *SS ?: http://bit.ly/kwyVlB

14:27 CST | category / entries / tweets
permanent link | comments?

Tue, 26 Apr 2011

Coolest ChatRoulette experience

So was practicing guitar the other night and needed to get over the fear of having people watching me while I’m playing. I hop on ChatRoulette to practice and discover that there is indeed 99.8% less wang than before (I used to have to cover up the “stranger” window with a post-it note and only take a look if they started chatting). Overall it was a pleasant night, with three standout incidents.

Left me smiling for sure.

On the other note, there were 2-3 guys who seemed to get off verbally abusing people anonymously over the internet. But even that was good practice, trying to maintain the beat while people are talking to me, yelling at me, trying to distract me or simply having their own music on in the background or wearing “The Count” from Sesame Street outfits. I just wonder about the shouters… what they like about putting people down.

11:22 CST | category / entries
permanent link | comments?

“It’s more efficient to throw money at performance issues than to throw code at them, but I guess some of you guys just really like to type”

11:17 CST | category / entries / tweets
permanent link | comments?

Tue, 19 Apr 2011

A wise man is sitting at the gate of a large city…

A traveler approached him and asked, “Sir, I’m new here. Could you tell me the kind of people that live in this city?”

After pondering, the wise men asked in return, “And what were the people like where you came from?”

The man replied, “They were unfriendly and mean-spirited.”

The wise man responded, “That’s what they’re like here, too.”

Not long thereafter another traveler approached the city and asked the wise man again the kind of people that lived within the city. “What were the people like in the city that you’ve come from?”

The traveler replied, “Friendly, good-hearted, willing to help their neighbor,” to which the wise man responded, ”And that is what they are like here, too.”

[source…]

13:00 CST | category / entries / links
permanent link | comments?

Thu, 24 Mar 2011

””“I think the answer is: receive a Commodore 64 for your tenth birthday and no good games.”“” http://bit.ly/gRN8m6

13:32 CST | category / entries / tweets
permanent link | comments?

Wed, 23 Mar 2011

Wow. Great new implementation on http://search.yahoo.com/ … Tough to say if it leads the pack or not, but is definitely in competition.

15:55 CST | category / entries / tweets
permanent link | comments?

Fri, 11 Mar 2011

Good login cookie protocol here http://fishbowl.pastiche.org/2004/01/19/persistent_login_cookie_best_practice/ … I can’t poke holes in it.

10:57 CST | category / entries / tweets
permanent link | comments?

Thu, 10 Mar 2011

Debugging PHP segfault backtraces with `gdb`

Hello, and welcome. I am going to assume you have gone through the normal PHP documentation about how to get core files, load them into gdb and run the bt (backtrace) command.

In my particular case at work, I ran into a PHP segfault in in the oci_execute function. It’s a C function / module for querying the oracle databases which evidently has some sort of crash bug in certain circumstances.

This was clear from the multiple backtraces we had captured, consistently OCIStmtExecute exposed in PHP via oci_execute().

(gdb) bt
#0  0x00000000 in ?? ()
#1  0xf6f83cd6 in ttcdrv () from /lib/libclntsh.so.10.1
#2  0xf6e25461 in nioqwa () from /lib/libclntsh.so.10.1
#3  0xf6c97032 in upirtrc () from /lib/libclntsh.so.10.1
#4  0xf6c2dce9 in kpurcsc () from /lib/libclntsh.so.10.1
#5  0xf6be9fba in kpuexecv8 () from /lib/libclntsh.so.10.1
#6  0xf6bec360 in kpuexec () from /lib/libclntsh.so.10.1
#7  0xf6c60b3a in OCIStmtExecute () from /lib/libclntsh.so.10.1
#8  0xf7609199 in php_oci_statement_execute
    (statement=0xf1446534, mode=138243868) at /...442
#9  0xf760fdc5 in zif_oci_execute (ht=1, return_value=0xf1325ab4,
    return_value_ptr=0x0, this_ptr=0x0, return_value_used=1) at /...1302
#10 0x081ab5c1 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff60d0) at /...200
#11 0x081aad69 in execute (op_array=0xf61a00b8) at /...92
#12 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff6500) at /...234
#13 0x081aad69 in execute (op_array=0xf7b9378c) at /...92
#14 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff6650) at /...234
#15 0x081aad69 in execute (op_array=0xf7ba6a9c) at /...92
#16 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff6cd0) at /...234
#17 0x081aad69 in execute (op_array=0xf5c2db80) at /...92
#18 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff7000) at /...234
#19 0x081aad69 in execute (op_array=0xf5b69434) at /...92
#20 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff99c0) at /...234
#21 0x081aad69 in execute (op_array=0xf5ad5684) at /...92
#22 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffff9d20) at /...234
#23 0x081aad69 in execute (op_array=0xf5ad46fc) at /...92
#24 0x081aafb5 in zend_do_fcall_common_helper_SPEC
    (execute_data=0xffffb090) at /...234
#25 0x081aad69 in execute (op_array=0xf7b72f1c) at /...92
#26 0x08191311 in zend_execute_scripts
    (type=8, retval=0x0, file_count=3) at /...1135
#27 0x08157b1a in php_execute_script (primary_file=0xffffd640) at /...2064
#28 0x0820e6b4 in main (argc=18, argv=0xffffd724) at /...1176

Looking at the PHP doc’s there were a few more items to try.

(gdb) print (char *)(executor_globals.function_state_ptr->function)->common.function_name
$1 = 0x218bac "oci_execute"

(gdb) print (char *)executor_globals.active_op_array->function_name
$2 = 0xb7beaa5c "getData"

(gdb) print (char *)executor_globals.active_op_array->filename
$3 = 0xb7b4e4c4 "DatabaseConnection.php"

Great… still oci_execute, and yep, it’s called via the database wrapper. But what SQL is being executed? What is the function that is calling the generic getData() function?

Obviously there are other “execute” items on the stack, but how to get to them, and what to do with them once you’re there?

With some help from the wonderful #gdb ambassador “xdje” on irc.freenode.net, I was able to get a crash course on using gdb in a slightly more advanced way for debugging PHP core files.

You already know bt which will print you out the backtrace. After that I started with help but I didn’t know how to describe what I was looking for so that was basically a dead end.

Through some googling, I learned a little about the frame #, up, and down commands which will move your context through the backtrace, but I couldn’t understand why even when I moved through the stack the print ... commands from the PHP web page always printed out the same data.

Then I learned about the info ... subsets. Ok, now we are getting somewhere. In the example above, I knew there was important stuff located in frames 13, 15, 17, etc. which I hoped contained the PHP function names that were responsible for each execute... line. From a “C” perspective, the function question is always execute() but I needed to know what was going through PHP’s tiny little brain.

help info, info locals looks promising.

(gdb) bt
....

(gdb) frame 13
#13 0x081aad69 in execute (op_array=0xb7bead08) at zend_vm_execute.h:92
92      in zend_vm_execute.h

(gdb) info frame
Stack level 13, frame at 0xbff17080:
 eip = 0x81aad69 in execute (zend_vm_execute.h:92); saved eip 0x1cbad4
 called by frame at 0xbff170d0, caller of frame at 0xbff16e10
 source language c.
 Arglist at 0xbff17078, args: op_array=0xb7bead08
 Locals at 0xbff17078, Previous frame's sp is 0xbff17080
 Saved registers:
  ebx at 0xbff17070, ebp at 0xbff17078, esi at 0xbff16e00,
  edi at 0xbff17074, eip at 0xbff1707c

(gdb) info locals
execute_data = {opline = 0xb7b55e88, function_state =
  {function_symbol_table = 0xb7150838, function = 0xa270c48,
  reserved = {0x8188816, 0xbff170c0, 0x0, 0xbff17068}}, 
  fbc = 0x0, op_array = 0xb7bead08, object = 0x0, Ts = 0xbff16e50,
  CVs = 0xbff16e20, original_in_execution = 1 '\001', symbol_table = 0xb714f98c, 
  prev_execute_data = 0xbff175e0, old_error_reporting = 0x0}

That looks promising, but I have no idea what I’m looking at or how to figure out what is hidden away in this core dump. At this point my IRC buddy on xdje jumped in to save the day.

Let’s go back to the original PHP print commands to understand them, but first a slight detour back to the “frame info” from above. It turns out that gdb accepts “expressions in the programming language of the core dump” (or something). It all of a sudden makes the following line very important:

(gdb) info frame
Stack level 13, frame at 0xbff17080:
 eip = 0x81aad69 in execute (zend_vm_execute.h:92); saved eip 0x1cbad4
 called by frame at 0xbff170d0, caller of frame at 0xbff16e10
 source language c.
 Arglist at 0xbff17078, args: op_array=0xb7bead08
 Locals at 0xbff17078, Previous frame's sp is 0xbff17080
 Saved registers:
  ebx at 0xbff17070, ebp at 0xbff17078, esi at 0xbff16e00,
  edi at 0xbff17074, eip at 0xbff1707c

…and helps to understand a bit the original print statement which is mostly c-style struct and pointer traversal.

(gdb) print (char *)executor_globals.active_op_array->function_name
      ^-- GDB command
            ^-- dereference / type
                     ^-- variable
                                     ^-- variable
                                                      ^-- variable

OK, so what is executor_globals? Turns out (don’t ask how just yet) that it is basically a global variable. So you go daisy-chaining down the line of function calls and data values and finally you can get to function_name. Dandy.

Now what else can we do with print? How can we print the function_name of the current position in the stack?

(gdb) info frame
Stack level 13, frame at 0xbff17080:
 eip = 0x81aad69 in execute (zend_vm_execute.h:92); saved eip 0x1cbad4
 called by frame at 0xbff170d0, caller of frame at 0xbff16e10
...this is important
 source language c. 
...aha, here is another op_array
 Arglist at 0xbff17078, args: op_array=0xb7bead08 
 Locals at 0xbff17078, Previous frame's sp is 0xbff17080
 Saved registers:
  ebx at 0xbff17070, ebp at 0xbff17078, esi at 0xbff16e00,
  edi at 0xbff17074, eip at 0xbff1707c

...hrm, not what I expected
(gdb) print 0xb7bead08->function_name 
Attempt to extract a component of a value that is not a structure pointer.

...maybe it needs that char * thing?
(gdb) print (char *)0xb7bead08->function_name 
Attempt to extract a component of a value that is not a structure pointer.

...oh, there are operator precedence rules at play here
(gdb) print ((char *)0xb7bead08)->function_name 
Attempt to extract a component of a value that is not a structure pointer.

...here is where IRC buddy saved the day
(gdb) info args 
op_array = (zend_op_array *) 0xb7bead08

...aha! you have to treat that memory address as a *type* and then you can get data off of it
(gdb) print ((zend_op_array *)0xb7bead08)->function_name 
$11 = 0xb7beaa5c "getData"

Ok, so now we’re getting somewhere. But guess what. There is an even simpler way of doing the above.

...d'oh! you can reference it just like a variable, nothing fancy required
(gdb) print op_array->function_name 
$11 = 0xb7beaa5c "getData"

…but how do I know that I can access function_name off of that op_array thing? More help from my IRC buddy exposes the ptype command:

...aha! this is like dir(...) in python
(gdb) ptype op_array 
type = struct _zend_op_array {
    zend_uchar type;
    char *function_name;
    zend_class_entry *scope;
    zend_uint fn_flags;
    union _zend_function *prototype;
    zend_uint num_args;
    zend_uint required_num_args;
    ...
} *

So now we’re really getting somewhere.

...backtrace
(gdb) bt 
...

...jump to where i want info
(gdb) frame 13 

...hrm, see the op_array variable
(gdb) info args  

...there's a function_name field, looks useful
(gdb) ptype op_array 

...this is what i'm looking for
(gdb) print op_array->function_name 
$13 = 0xb7beaa5c "getData"

...go to the next function call up the chain
(gdb) frame 15 

...print its name too
(gdb) print op_array->function_name 
$14 = 0xb7f6d248 "getMissingDates"

...i wonder if I can get the SQL for this as well
(gdb) frame 8 
#8  0x00210199 in php_oci_statement_execute (statement=0xb17ccfbc, mode=171218828)
    at oci8_statement.c:442
442     oci8_statement.c: No such file or directory.
        in oci8_statement.c

(gdb) info args
statement = (php_oci_statement *) 0xb17ccfbc
mode = 171218828

(gdb) ptype statement
type = struct {
    int id;
    int parent_stmtid;
    php_oci_connection *connection;
    sword errcode;
    OCIError *err;
    OCIStmt *stmt;
    char *last_query;
    long int last_query_len;
    HashTable *columns;
    HashTable *binds;
    HashTable *defines;
    int ncolumns;
    unsigned int executed : 1;
    unsigned int has_data : 1;
    ub2 stmttype;
} *
...hrm, last_query looks useful?
(gdb) print statement->last_query 
 print statement->last_query
$15 = 0xb1546574 "select ... from date_table where ..."

(gdb) print "bingo"
You can't do that without a process to debug.

Wow. Now that is pretty productive. I started with a PHP script that randomly segfaulted and dumped core. From the core file I can generate a “C” backtrace. Walking through the “execute” statements I can get at basically the “PHP” backtrace. And furthermore I can walk a bit more “forward” into “C-land” and get info from the C module itself (statement->last_query).

Messing with GDB was really quite rewarding and just so long as I post this guide to my blog, I won’t forget it next time I need to go debugging PHP segfaults.

21:55 CST | category / entries
permanent link | comments?

Tue, 08 Mar 2011

””“But now the Senior Vice President for Bad Decisions at Yahoo had decided to give us a little help.”“” http://pinboard.in/blog/173/

14:45 CST | category / entries / tweets
permanent link | comments?

Wed, 02 Mar 2011

13:03 CST | category / entries / tweets
permanent link | comments?

Wed, 23 Feb 2011

Fat cats go down alleys eating birds.

19:44 CST | category / entries / tweets
permanent link | comments?

Fri, 11 Feb 2011

14:43 CST | category / entries / tweets
permanent link | comments?

Tue, 08 Feb 2011

Cool! http://sharkbait.computerworld.com/node/2585 … A 10-digit number starting with “214”. In other words, a Dallas phone number.

17:25 CST | category / entries / tweets
permanent link | comments?

Thu, 27 Jan 2011

Cool throwaway feature in iOS4.2

AirTunes remote speakers PLUS (I think) bluetooth speakers IN ANY APP WITH AN AUDIO VOLUME BAR. I was able to repurpose an old iPod touch which could always have run pandora, but now we can have the ipod next to the couch streaming over wifi instead of next to the stereo and speakers with a stereo cord. When you have people coming to a party, now any of them can play music on the speakers (I think?) after going through the sync process. Trés cool. With this simple update I can’t recommend highly enough that you get one of the Airport Express thingies for your stereo.

11:30 CST | category / entries
permanent link | comments?

Tue, 25 Jan 2011

Be still my heart: http://www.abunchofutils.com/utils/developer/cron-expression-helper/ helps you make crontab lines.

14:05 CST | category / entries / tweets
permanent link | comments?

Tue, 11 Jan 2011

10:19 CST | category / entries / tweets
permanent link | comments?

Like what you just read? Subscribe to a syndicated feed of my weblog, brought to you by the wonders of RSS.



Thanks for Visiting!