Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    Geek-o-Terica 9: Memory Sub-allocator (or, why your server’s memory usage only goes up), and Happy Father’s Day

    Bob Balaban  June 26 2009 12:12:42 AM
    I did an exploration of product memory management many years ago. I worked on a couple of different Lotus products (1-2-3, Notes) that had sophisticated memory-management packages in them. All of them were what you might call "sub-allocators", basically they grabbed big chunks of memory from the operating system, and broke up the big chunk into smaller chunks for program code to use.

    This past Sunday was Father's Day, a nice, family-oriented holiday, essentially invented by Hallmark. I got to thinking about my father, and also about my father's father. He was an ordinary, yet interesting man, born either on the boat over from the Ukraine (or maybe Byelorus, the people who knew for sure have been dead a long time, and nobody's sure), or shortly after arrival in 1900. His name was Sam. My father relates that Sam went into the clothing business in New York City as a young man. When the Depression hit hard, in the early 1930s (editorial comment: it was MUCH worse than the one we're in now, bad as that is...), Sam got laid off, and had the guts to start his own business. He formed a company to buy large amounts of fabric from the cloth manufacturers (this was when the United States still had a textile industry, now long gone), and sell that cloth in smaller chunks to the clothing makers of New York's Garment District. One of his sons, my Uncle Ed, joined the business. My father, Al, did not. Al went to college, then got drafted into the Army during World War 2.

    My grandfather used to be a sub-allocator, though he never would have called it that. The analogy works really well. Sam thought of himself as a broker: he'd get enough money together (couldn't have been easy in the early 30's) to buy big bolts of cloth, haul a sample case around to the clothing manufacturers and sell them what they needed. He tended to specialize in cloth for women's dresses. Sam's company lasted until the mid 1960s, I remember going to visit "the office" once or twice. Rolls of cloth everywhere.

    How is that like memory management in Notes or 1-2-3 (does anyone still use 1-2-3?)? There were 2  problems with memory allocation in earlier versions of Windows: it was slow, and each chunk that you would get from the OS represented one "system handle", regardles of the size of the chunk. And system handles were a finite resource, you couldn't get more than (in one version of Windows NT, as I recall) 4096 of them. When they were gone, you were "out of memory", and you'd mostly likely crash. The other problem with it was that it was slooowwww. Not so much getting  the memory, but "freeing" it, or retuning memory you no longer needed back to the system was very slow. Why? Because of something called "free block consolidation".  The OS would try to figure out when you told it you were done with a particular block of memory, what other "free blocks" there were (if any) adjacent to your's, so it could consolidate them together, raising the odds that the next big allocation would find a contiguous chunk of the size needed easily. Otherwise, the system's memory would become "fragmented", and allocating new chunks would become much more difficult.

    So what does sub-allocation do for you? Say 3 pieces of a program each want around 100KB of memory for a little while. They could each go to the Windows API and request 100KB, but that would use up 3 (scarce) system handles. But what if, instead, we had a "memory manager", a "broker" (we could name it Sam) who, instead would track requests for memory allocations. "Sam" could request say, 64MB of memory from the OS all at once, then break that big chunk up into 3 100KB blocks as requested, using only 1 system handle. If we wrote Sam to be clever enough, he'd hve an efficient way of knowing which bits of his big 64MB chunk were being used, and which were not. And when the 3 program tasks were done with their 100K bits, they could "free" them with a call back to Sam, who could efficiently consolidate them back into his pool of available memory.

    We (the 1-2-3 and Notes product teams) did write such memory managers (we didn't call them Sam, though), and they worked a whole lot better and a whole lot faster than the Windows API did.

    One interesting side-effect of using a memory sub-allocator in a product is something that people who pay careful attention to system memory usage often notice. If you use a system monitor (say Windows Task Manager) to track system memory usage, and then run some kind of performance test suite (maybe a browser client hitting a web page over and over, or hitting different web pages over and over to avoid the page-caching optimization), you see what might appear to be strange behavior: memory usage goes up, up, up, up, and then plateaus, but rarely does it go back down again.

    Why does it happen this way? Well, Sam could have told you. The sub-allocator grabs memory from the OS in bug chunks, and parcels it out in smaller chunks. If it's block gets used up, it goes back to the OS for another (big) block. Both big blocks would be notieced by TaskManager. But sub-allocating out of those blocks is not registered by Task Manager, because it's done only within the program. When a task (HTTP server, for example) asks for a bunch of memory to process an outgoing web page, it gets a sub-allocated block. When the HTTP task returns ("frees") that smaller block back to the memory manager ("Sam"), the memory manager does NOT go return it to the OS. First of all, it's too small, it doesn't represent (usually) a full "handle's worth" of memory that was obtained from the OS in the first place. Secondly, Sam is going to keep it around because he knows that it's probably going to be needed again, laer, for something else.

    That's why you see (at the system level) more and more memory going to the product's process, and only once in a while (when Sam has an entire big block of memory that he got from Windows free again, and decides he doesn' need it anymore) does the number go down again, maybe when the server is idle for a while.

    So, there you have it. Yet another example of art imitating life. Here's to my father Al, on Father's Day, and here's to my Grandfather Sam, sadly now no longer with us. I love you, Dad.
    And here's to all fathers out there, everywhere. Happy Father's Day!

    Geek ya later!

    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.
    Comments

    1Mark Ryan  8/2/2009 5:24:52 PM  Geek-o-Terica 9: Memory Sub-allocator (or, why your server’s memory usage only goes up), and Happy Father’s Day

    Bob,

    Nice to see you independent. Last time I emailed you were leaving for IBM.

    It is rare that anybody talks much about memory allocation, but I loved your tie into the family business.

    I once wrote a memory allocator and devised some interesting tricks to improve performance, most importantly in the areas of garbage collection and holding onto memory handles (sub-handles) to optimize for concatenation operations. I imagine that kind of stuff is standard now, but it was new back in 1983 when I first worked on it while writing the run time system for a compiler.

    FYI: I still have my goes to 11 t-shirt. Never worn, just preserved!

    2Bob Balaban  8/2/2009 7:08:46 PM  Geek-o-Terica 9: Memory Sub-allocator (or, why your server’s memory usage only goes up), and Happy Father’s Day

    @Mark - Man, you gotta WEAR the t-shirt for the juju to take effect!! :-)

    When I worked on 123/G at Lotus, there was a really super-fast memory allocator for fixed-size blocks. If you knew that you were going to need a bunch of mem chunks, all of the same size, you could create a pool of specified chunk-size, and it would alloc and de-alloc really, really fast. Haven't seen the like since (ca. 1989), but it was a great optimization.