My upgrade from Debian 7 (Wheezy) to 8 (Jessie)

Published / by Steve

I started with the official docs (https://www.debian.org/releases/stable/i386/release-notes/ch-upgrading.en.html) and followed them through the first reboot. At this point I found myself with a black screen and I couldn’t get to VT1 or VT2. Booting to recovery mode worked, so I could read the logs.

nVidia drivers
In the end I had to entirely remove the nvidia-related packages that came from Debian, then download and run the installer from the nVidia website. I spent a bunch of time trying to avoid this since it shouldn’t be necessary, but in the end it was the approach that finally worked. I used this page: http://www.allaboutlinux.eu/remove-nouveau-and-install-nvidia-driver-in-debian-8/

More black screens
They weren’t over yet. In kern.log I found “NVRM: failed to copy vbios to system memory”. I found reports that kernels in the 3.10 to 3.18 range have issue with nVidia graphics. The Jessie kernel is 3.16 and I wanted to stay with a stock Debian kernel. Fortunately there is a workaround, the rcutree.rcu_idle_gp_delay=1 kernel parameter. This goes in /etc/default/grub. I added it to GRUB_CMDLINE_LINUX_DEFAULT:

GRUB_CMDLINE_LINUX_DEFAULT="quiet rcutree.rcu_idle_gp_delay=1"

After then running ‘update-grub’ my laptop booted into graphics mode.

“Oh no! Something has gone wrong.”
Not that the story ends there; the message “Oh no! Something has gone wrong” appeared after I rebooted. But I was happy, because at least I was in graphics mode and I had a new error to chase. My research suggested this had something to do with the Gnome desktop. Earlier in the upgrade process one of the steps uninstalled the Trinity desktop environment that I use, so I reinstalled that, choosing its components in the configuration questions I asked. And then suddenly, *finally*, everything was back! My desktop settings and my custom menu were there, WiFi worked, printing worked, and I could begin to try out the upgraded system.

This was not the worst operating system upgrade I’ve done, but it was probably the worst Linux upgrade I’ve done. I start operating system upgrades by cloning to a new drive so I can swap back whenever I need to. That was a lifesaver here.

Nanofunctions

Published / by Steve

When you’re attempting to break up monolith, one of the big difficulties is that it can be hard to find good, coherent microservices – ones which are logically coherent and map to a clear bounded context, but aren’t so large that they’re more mega-services than microservices.

In 2016 there’s a new possibility: go further, splitting the monolith into many separate “service” level functions, each of which is deployed separately (and can be scaled separately). For example the GetOrders function would be separate from the CreateOrder function, which is separate from the DeleteOrder function. (This example comes from my notes from the “Serverless meets SaaS: The Ultimate Match” session at QCon SF 2016, which was based on Amazon Lambda.)

Since these functions are much smaller than microservices, I’ve begun thinking of them as “nanofunctions”. Amazon Lambda is not the only company entering this space, just the one I’m most familiar with as of today.

Amazon Lambda’s catch phrase is “serverless compute”. “Serverless” is excessive hype IMHO. There most certainly is a server. The point is that you don’t set it up, configure it, or worry about it in general – you pay someone else to do that for you. “Function as a Service” (FaaS) is a much more descriptive name, but unfortunately not as catchy.

In this model each “service” level function is deployed separately. This is a very different approach from microservices. With microservices you strive to avoid sharing storage such as database tables, because that results in integration at the database level and there is plenty of industry experience of how painful that can become.

With these “nanofunctions” there’s no practical way to avoid integration at the storage level. GetOrders has to somehow read the data stored by CreateOrder or it can’t do its job. This is where it starts to feel like uncharted territory – you must have to keep track somehow of which nanofunctions are sharing storage, and keep your docs up to date, or you could have unpleasant surprises in your future. Maybe it’s an environment where you just don’t make a change without asking what other components will be affected? Maybe instead of one team owning a microservice you have one team that owns a group of nanofunctions, whether it’s a logically coherent group or one that was arbitrarily chosen?

It’s early days yet for this idea. I’m finding it highly intriguing.

Making a CVS project read-only

Published / by Steve

As I posted last time, I’ve been incrementally moving CVS projects to Git repositories. After each project is migrated I’ve been making its portion of the CVS repository read-only, so that access is still possible in case we need to go back and look at something but new commits can’t come in and cause the CVS repository to get out of sync with the Git repository.

The Internet contains a number of ways to make a CVS project read-only; the first one that actually worked for me was to set up a pre-commit hook for the project that blocks commits for that project. It’s based on the “Your Catchphrase Here!” post Read-only CVS access for only certain projects, in particular the read-only-project.sh script.

Pre-commit hooks are defined in the CVSROOT project, so the first step was to check that out with ‘cvs checkout CVSROOT’. Next I created a read-only-project.sh script based on the “Your Catchphrase Here!” blog. I made it executable so I could test locally; I’m not sure if this is necessary but since it was working I left it. Then I added it to CVS and committed.

But initially the hook didn’t work. In order for the hook to run the script, the CVS repository needs to contain the file “read-only-project.sh”, not just the file “read-only-project.sh,v”. To do that I modified the file checkoutlist (in the CVSROOT project). I added read-only-project.sh to its contents and committed. As soon as that finished the CVSROOT project directory on the CVS server now contained the file “read-only-project.sh”.

The next step was to modify the commitinfo file, which is where the hooks are configured. I added a line like
^project-name /[path]/read-only-project.sh
This is the biggest difference from the “Your Catchphrase Here!” blog: I don’t have anything after project name and don’t have anything after the script name either. I don’t know why this is, it’s just the result of trial and error. For the record the CVS server I’m working with is version “1.11.23 (c) 2006”.

Once set up, this approach works nicely. Commits are blocked and you get informative message explaining why when you try to commit.

Notes on migrating from CVS to Git (part 2 of ?)

Published / by Steve
  • when I began moving a second project from CVS, one unexpected difficulty was that Git doesn’t support putting empty directories in version control but CVS does. (I found this because I did a recursive diff of the directory trees checked out from CVS and from Git.) I had to create a hidden file for each such directory that had been in CVS, so that Git would have a file to commit. I called it .keep so it would be a hidden file (on Linux at least), and put a couple of lines inside explaining why it was there.
  • so far the biggest problem has been the SOCKS proxy that our ssh connections have to go through, rather than Git itself. Once someone gets to the point where they can ssh to our server through the proxy using key-based authentication, the worst is usually over.
  • Eclipse has been a pain point:
    • for the EGit plugin in Eclipse to work through the proxy, the environment variable GIT_SSH needs to be set correctly. This causes EGit to use an external ssh implementation instead of its internal one. We had to use ssh keys without a passphrase, because we couldn’t get Eclipse to prompt for one. (I’ve seen other posts that it should – perhaps the behavior changed with Eclipse 4.0?)
    • another oddity: on some systems Eclipse wouldn’t see this environment variable until Windows was rebooted. We’re running Windows 7 64-bit. I did not see this myself, but rebooting has become my go to recommendation when things just aren’t working the way they should.  (It’s Windows, after all.)
  • there’s been a minor recurring difficulty – writing about it. When updating our internal wiki pages: “Git” looks odd, “git” doesn’t look like a name.

The biggest remaining challenge may well be making a part of a CVS repository read-only… So far I’ve found at least 4 ways to do that, and the simple one didn’t work.

Notes on migrating from CVS to Git (part 1 of ?)

Published / by Steve

When I started on this I hoped to find something that allowed the sort of easy coexistence that git-svn does, but no such luck. There are two tools for importing from CVS to Git that I tried: git-cvs and cvs2git.

  • git-cvs: this uses git cvsimport and has a dcommit command like git-svn.
    • the import appeared to work, but when I attempted to build the code the build failed. A small number of files in Git were at older revisions than they were in CVS. It seemed that the newer revisions simply didn’t get imported.
    • git-cvs supports incremental imports to an existing repository. I used this for over 6 months before switching to cvs2git, to track the ongoing changes in CVS, so that I could use git’s tools to look at changesets, find the origins of changes, and visualize and search the history of changes across multiple branches.
    • I was hoping to use this to do the final import, but could find no errors to explain why some of the files were at older revisions.
  • cvs2git: this is part of the cvs2svn project, which can make the documentation a bit confusing at times, but with a little experimentation it becomes clear.
    • after importing to git and cloning the repository, I recursively compared the files with a CVS checkout done at the same time. The only difference was where there were tags that CVS expands on checkout, because git doesn’t expand those tags.
    • there’s no way to do an incremental import into an existing repository; you have to start over every time. This means that the cutover has to be a big-bang style, with all commits to CVS stopped and no commits to git until the conversion is complete.

When it came to time to migrate I had to use cvs2git and arrange a big-bang cutover, which I did during two consecutive days of meetings spent planning the upcoming quarter, which conveniently prevented anyone from having the time to make changes worth committing.

Note: there is another git-cvs on GitHub: git-cvs by ustuehler. As far as I can tell this is entirely different from the one by osamuaoki. Unfortunately I found this one too late to try it out.

Live demo thoughts

Published / by Steve

Before attempting to do a live demo of how to use Git via ssh, make sure that you can ssh to the server, through any tunnels which may be needed, making any VPN connections needed, while using the conference room WiFi or wired connection. Not the WiFi at your desk, and certainly not the wired connection at your desk, but in the conference room.

I’m just saying.

When your Java method is clearly too damn big

Published / by Steve

Personally I strongly prefer small methods. Reasonable people can disagree about when a method is too big, but when you get an error message like this, your method is too damn big.

java.lang.ClassFormatError: Invalid Method Code length <123456> in com/example/MyClass

That’s the JVM telling you that the method is larger than it can cope with it. 123456 is the size of the method bytecode in bytes.

Now I have not actually seen a method so long that the code alone triggered this error – it happened when we used PowerMock to mock a large class with large methods, then tried to use Cobertura to measure our test coverage.

The fix: breaking up the large methods. When you see this error, it’s time.

 

Steps towards breaking up a monolith

Published / by Steve

Okay, so you’ve got a large monolithic web application. It’s big enough to be hard to maintain, enhance, and scale. But from a business perspective it’s successful enough that you can’t afford to just close it down and walk away. A microservice/SOA approach looks like the way to go, but how to get from here to there?

One approach to consider: begin by refactoring the code for clarity. This is useful when defining a good set of services is hard, such as when the application implements complex business logic, there isn’t a good set of specifications for what the business logic should be, and there’s nobody left around who could know. Or when where to draw the lines that will create a clean set of bounded contexts is less than clear, or the clear boundary results in a service so big it feels like it’s half the size of the monolith. It’s much faster to refactor within a single codebase than to rearrange services after they’ve been split out, and it will improve your ability to maintain the codebase as soon as it’s done.

Note: this discussion assumes that the monolith is using a relational database. Use of NoSQL or some other storage model doesn’t actually change much, it’s just that trying to cover multiple possibilities made this post verbose and cluttered.

Review what you have

Start by reviewing the existing codebase. Somewhere in it there must be classes which are model objects/DTOs, and objects which are accessing the data store (DAOs). There must be controllers which accept HTTP requests and produce HTTP responses. With any luck there are some “service” classes which orchestrate calls to the DAOs to implement business logic on behalf of the controllers.

(If there are objects which fit into more than one category, that will get cleared up as we go.)

Hopefully each set of classes will have its own package. If not, start by moving them into separate packages. Don’t worry if they don’t fit neatly at this early stage; additional clarity will emerge as more refactoring is completed.

Model Objects

Now go through the model objects/DTOs. These are candidates for service request and response bodies. There should be little to no business logic in these classes. Any database queries should be moved to the service classes.

DAOs

Next are the DAOs. All database calls go in the DAOs. If there are still any that aren’t inside a DAO, move them into one. Check the service classes, as database calls like to hide there. Create new DAOs as needed.

Note that as a general rule, DAOs should not begin or commit database transactions. Service classes do that so they can orchestrate multiple DAOs calls into a single transaction. However it’s not necessary to fix them all at this point – that can be done after the service classes have been refactored.

Controllers

Next up are the controllers. Refactor the controllers to only talk to services, not DAOs. This may result in service methods which are simply pass-throughs to DAOs or that do little more than begin and end transactions. That’s okay, we’re laying the ground work for separate services. When that happens the controller will need a web service client with provisions for error handling, and that will rate a separate method in a class outside the controller.

Service Classes

Finally we come to the service classes. At this point each service class is a candidate to become a separate web service. This is the most difficult part of the process. Take a step back and review each service class – is it implementing a bounded context? Does it minimize coupling and maximize cohesion? Is the size reasonable? Smaller is better. A good goal is less than 10 public methods exposed to controllers. If there are more than 25 public methods exposed to controllers, definitely look for ways to split up the class.

If the service classes aren’t meeting these criteria, split them up and rearrange methods so that they do. This step will require multiple iterations, possibly many. If it seems overwhelming or you don’t know where to start, try just moving one or two of the most obvious methods. If a class is very large you may need to arbitrarily split it in half, so that ServiceClass becomes ServiceClass1 and ServiceClass2. This will at least result in more manageable pieces. (And it makes the code smell of an overly-large class more obvious, which is actually a good thing.)

Once you move a few of the obvious methods out of the way, the remaining structure will become clearer with less to obscure it. Then you can move a few more methods, and begin to create and split up classes. Don’t be surprised if one day you move a method into class A and two days later find that it should be in class B. In fact, expect it. This is why we’ve waited to create services: you can iterate much faster when moving a method from one class to another in the same project than when moving a method from one service to another.

Web Services

Eventually you’ll reach a point where the code begins to look reasonably well structured, and at least some parts of it are firm and have been that way for a few weeks of development. This is where you can begin to split up the monolith, creating new web services from these more stable areas. If you carve off enough web services, the remaining service classes in the monolith will be small enough to each be a separate web service, even if it’s one that nobody wants to look inside.

Web Application

The original monolith will actually live on as a web application which contains the controllers and makes calls to the web services. Nothing wrong with that. Do your best to make it stateless, or limit the scope of the client state stored in the web application, so that it’s straightforward to cluster and scale horizontally. Then celebrate, it’s not a quick journey.

A beautiful race condition observed in the wild

Published / by Steve

Working on legacy code can lead you to seeing some things you’d never otherwise see. A few years ago I read the blog post “A Beautiful Race Condition”. It has a very detailed explanation of what can happen under the hood of a java.util.HashMap, which is not thread safe, when someone uses it as if it was. This was an interesting but hypothetical topic for me, because the problem is known and I’d been working on web services where we avoided sharing objects between threads.

Fast forward a few years and a couple of positions, and now I’m working on making a good-sized monolith able to scale. One day the operations team contacted me to say that one of our production servers was setting off alarms because its CPU usage was hitting 100%. When I got a thread dump from the problem server, what did I see but

  java.util.HashMap.getEntry(HashMap.java:347)
  java.util.HashMap.containsKey(HashMap.java:335)
  name.changed.to.protect.the.guilty.Example.getSomething(Example.java:1234)

(Unfortunately I can’t show the real code because it’s proprietary)

There were multiple threads stuck in this state. Seeing the stack trace triggered just enough recollection that I was able to google up that blog post. Rereading it confirmed that the description matched, which pointed out where I needed to go to do the fix.

For our servers it seems that one stuck thread will cause 1 CPU to report full usage. Two stuck threads a on 4 CPU VM caused 50% CPU usage for the entire server, and four stuck threads a on 4 CPU VM caused 100% CPU usage for that server. The system was otherwise responsive, so these threads weren’t preventing other threads from getting CPU cycles. That was fortunate, because the only way to get the CPU usage back down is to restart the JVM, and the only way to prevent the problem from recurring is to fix the code and deploy a new version.

It’s straightforward to make sure you’re not passing HashMap objects from one thread to another. A case to watch out for is using a HashMap as the value in a cache – if two different threads both get cache hits on the same key, the HashMap value object will end up shared by those threads. Since the contents of a cache are intended to be shared, any value stored in a cache should be a ConcurrentHashMap or one of the immutable Map implementations that’s available.

Java 6 to Java 8 migration notes, part 2

Published / by Steve

Recently I’ve been working towards migrating a legacy application from Java 6 to Java 8. While the language and the JVM are backwards compatible, there have still been a few issues that I thought I would document in the hope that it might help someone else.

There are number of unit tests which set up test data in the database before they start. (I know there are lots of reasons not to do that, but when an application has as few unit tests as this one does you hang on to whatever you’ve got.) One of these tests passed consistently with Java 6, but failed with Java 8. The failure suggested a problem with the test data.

Here’s a snippet of the code. The methods are the traditional ones from the JUnit 3 days. In the debugger I could see that data was being created successfully by the setup method, so how could there be a problem with it?

@Override
@Before
public void setUp() {
    // initialization code...
    // more initialization code...
}

@Override
@Before
public void tearDown() {
    super.tearDown();
}

It turned out that the problem was right in front of me but hard to spot: the tear down method was annotated with @Before. What happens when there are two @Before methods? If the tear down runs before the setup it’s okay, there’s just nothing actually deleted. But if the tear down runs after the setup it cleans out all the test data just before the test runs.

How this code ran reliably with Java 6 I do not know. To fix it for Java 8, I actually just removed the tear down method since (as you may have noticed) it didn’t do anything useful anyway.