Monday, December 19, 2011

Patterns in Practice..Cohesion And Coupling

Much of software design involves the ongoing question, where should this code go?
In particular, I would like to achieve three specific things with my code structure:
  1. Keep things that have to change together as close together in the code as possible.
  2. Allow unrelated things in the code to change independently (also known as orthogonality).
  3. Minimize duplication in the code.
To achieve these three goals, I need some tools to help me know where new code should go and some other tools to help me recognize when I've put code in the wrong place.
By and large, these goals are closely related to the classic code qualities of cohesion and coupling. I achieve these goals by moving toward higher cohesion and looser coupling. Of course, first we need to understand what these qualities mean and why coupling and cohesion are helpful concepts. Then I'd like to discuss some of what I'm going to call "design vectors" that help us move toward better structures and recognize when we need to move away from bad structures that have crept into our code.
Just as a note, I'm the primary developer of an open source inversion of control (IOC) tool called StructureMap. I'm going to be using some real-life examples from StructureMap to illustrate the design problems that these vectors address. In other words, don't make the same mistakes that I made.

Decrease Coupling
You can't help but hear the terms "loose coupling" or "tight coupling" in almost any discussion on software design. Coupling among classes or subsystems is a measure of how interconnected those classes or subsystems are. Tight coupling means that related classes have to know internal details of each other, changes ripple through the system, and the system is potentially harder to understand. Figure 1 shows a contrived sample of a business-processing module that is very tightly coupled to various other concerns besides the business logic.
public class BusinessLogicClass {
  public void DoSomething()  {
    // Go get some configuration
    int threshold = 
      int.Parse(ConfigurationManager.AppSettings["threshold"]);

    string connectionString = 
      ConfigurationManager.AppSettings["connectionString"];

    string sql = 
      @"select * from things 
        size > ";

    sql += threshold;

    using (SqlConnection connection = 
      new SqlConnection(connectionString)) {

      connection.Open();

      SqlCommand command = new SqlCommand(sql, connection);
      using (SqlDataReader reader = command.ExecuteReader()) {
        while (reader.Read()) {
          string name = reader["Name"].ToString();
          string destination = reader["destination"].ToString();

          // do some business logic in here
          doSomeBusinessLogic(name, destination, connection);
        }  
      }
    }
  }
}
Let's say that what we really care about is the actual business processing, but our business logic code is intertwined with data-access concerns and configuration settings. So what's potentially wrong with this style of code?
The first problem is that the code is somewhat hard to understand because of the way the different concerns are intertwined. I'll discuss this further in the next section on cohesion.
The second problem is that any changes in data-access strategy, database structure, or configuration strategies will ripple through the business logic code as well because it's all in one code file. This business logic knows too much about the underlying infrastructure.
Third, we can't reuse the business logic code independent of the specific database structure or without the existence of the AppSettings keys. We also can't reuse the data-access functionality embedded in the BusinessLogicClass. That coupling between data access and business logic might not be an issue, but what if we want to repurpose this business logic for usage against data entered directly into an Excel spreadsheet by analysts? What if we want to test or debug the business logic by itself? We can't do any of that because the business logic is tightly coupled to the data-access code. The business logic would be a lot easier to change if we could isolate it from the other concerns.
To summarize, the goals behind achieving loose coupling between classes and modules are to:
  1. Make the code easier to read.
  2. Make our classes easier to consume by other developers by hiding the ugly inner workings of our classes behind well-designed APIs.
  3. Isolate potential changes to a small area of code.
  4. Reuse classes in completely new contexts.
  5. Code Smells
    It's obviously good to know how to do the right things when designing new code, but it might be even more important to recognize when your existing code or design has developed problems. Like inappropriate intimacy, "code smell" (defined by Martin Fowler in the book Refactoring: Improving the Design of Existing Code) is a tool that you can utilize to spot potential problems in code.
    A code smell is a sign that something may be wrong in your code. It doesn't mean that you need to rip out your existing code and throw it away on the spot, but you definitely need to take a closer look at the code that gives off the offending "smell." A code smell can be spotted by simple inspection. Learning about code smells is a powerful tool to remove little problems in your code and class design before those problems become bigger issues.
    Many, if not most, of the commonly described code smells are signs of poor cohesion or harmful tight coupling. Here are some other examples:
    Divergent Changes A single class that has to be changed in different ways for different reasons. This smell is a sign that the class is not cohesive. You might refactor this class to extract distinct responsibilities into new classes.
    Feature Envy A method in ClassA seems way too interested in the workings and data fields of ClassB. The feature envy from ClassA to ClassB is an indication of tight coupling from ClassA to ClassB. The usual fix is to try moving the functionality of the interested method in ClassA to ClassB, which is already closer to most of the data involved in the task.
    Shotgun Surgery A certain type of change in the system repeatedly leads to making lots of small changes to a group of classes. Shotgun surgery generally implies that a single logical idea or function is spread out over multiple classes. Try to fix this by pulling all the parts of the code that have to change together into a single cohesive class.

    Increase Cohesion
    The academic definition of cohesion is that it is a measure of how closely related all the responsibilities, data, and methods of a class are to each other. I like to think of cohesion as a measure of whether a class has a well-defined role within the system. We generally consider high cohesion to be a good thing and repeat the words "highly cohesive" like a mantra. But why?
    Let's think of coding as having a conversation with the computer. More accurately, we're having several simultaneous conversations with the computer. We're having conversations about how security is carried out, how the infrastructure concerns are supposed to behave, and what the business rules are.
    When you're at a loud party with a lot of different conversations going on at once, it can be difficult to focus on the one conversation you're trying to have. It's much easier to have a conversation in a quiet setting where there's only one conversation going on.
    An easy test for cohesion is to look at a class and decide whether all the contents of the class are directly related to and described by the name of the class—vague class names, such as InvoiceManager, do not count. If the class has responsibilities that don't relate to its name, those responsibilities probably belong to a different class. If you find a subset of methods and fields that could easily be grouped separately under another class name, then maybe you should extract these methods and fields to a new class.
    To illustrate, if you find methods and data in your TrainStation, Train, and Conductor classes that seem to most accurately fit the theme of the TrainSchedule class, move those methods and data into the TrainSchedule. One of my favorite sayings about design is applicable here: put code where you'd expect to find it. It's most logical to me that functionality involving a train schedule would be in the TrainSchedule class.
    To stretch my earlier conversation analogy, a system consisting of cohesive classes and subsystems is like a well-designed online discussion group. Each area in the online group is narrowly focused on one specific topic so the discussion is easy to follow, and if you're looking for a dialog on a certain subject, there's only one room you have to visit.

    Eliminate Inappropriate Intimacy
    Inappropriate intimacy refers to a method in a class that has too much intimate knowledge of another class. Inappropriate intimacy is a sign of harmful, tight coupling between classes. Let's say that we have a business logic class that calls an instance of class DataServer1 to get the data it needs for its business-logic processing. Figure 2 shows an example. In this case, the Process method has to know a lot of the inner workings of DataServer1 and a bit about the SqlDataReader class.
    public void Process() {
      string connectionString = getConnectionString();
      SqlConnection connection = new SqlConnection(connectionString);
      DataServer1 server = new DataServer1(connection);
    
      int daysOld = 5;
      using (SqlDataReader reader = server.GetWorkItemData(daysOld)) {
        while (reader.Read()) {
          string name = reader.GetString(0);
          string location = reader.GetString(1);
    
          processItem(name, location); 
        }
      }
    }
    Now, let's rewrite the code in Figure 2 to remove the inappropriate intimacy:
    public void Process() {
      DataServer2 server = new DataServer2();
      foreach (DataItem item in server.GetWorkItemData(5)) {
        processItem(item);
      }
    }
    As you can see in this version of the code, I've encapsulated all the SqlConnection and SqlDataReader object manipulation inside the DataServer2 class. DataServer2 is also assumed to be taking care of its own configuration, so the new Process method doesn't have to know anything about setting up DataServer2. The DataItem objects returned from GetWorkItemData are also strongly typed objects.
    Now, let's analyze the two versions of Process method against some of the goals of loose coupling. First, what about making the code easier to read? The first and second versions of Process perform the same basic task, but which is easier to read and understand? Personally, I can more easily read and understand the business logic processing without wading through data-access code.
    How about making our classes easier to consume? Consumers of DataServer1 have to know how to create a SqlConnection object, understand the structure of the DataReader returned, and iterate through and clean up the DataReader. A consumer of DataServer2 only has to call a no-argument constructor and then call a single method that returns an array of strongly typed objects. DataServer2 is taking care of its own ADO.NET connection setup and cleaning up open DataReaders.
    And as for isolating potential changes to a small area of code? In the first version of the code, almost any change in the way DataServer works would impact the Process method. In the second, more encapsulated version of DataServer, you could switch the data store to an Oracle database or to an XML file without any effect on the Process method.

    The Law of Demeter
    The Law of Demeter is a design rule of thumb. The terse definition of the law is: only talk to your immediate friends. The Law of Demeter is a warning about the potential dangers of code, as in Figure 3.
    public interface DataService {
      InsuranceClaim[] FindClaims(Customer customer);
    }
    
    public class Repository {
      public DataService InnerService { get; set; }
    }
    
    public class ClassThatNeedsInsuranceClaim {
      private Repository _repository;
    
      public ClassThatNeedsInsuranceClaim(Repository repository) {
        _repository = repository;
      }
    
      public void TallyAllTheOutstandingClaims(Customer customer) {
        // This line of code violates the Law of Demeter
        InsuranceClaim[] claims = 
          _repository.InnerService.FindClaims(customer);
      }
    }
    The ClassThatNeedsInsuranceClaim class needs to get InsuranceClaim data. It has a reference to the Repository class, which itself has a DataService object. ClassThatNeedsInsuranceClaim reaches inside Repository to grab the inner DataService object, then calls Repository.FindClaims to get its data. Note that the call to _repository.InnerService.FindClaims(customer) is a clear violation of the Law of Demeter because ClassThatNeedsInsuranceClaim is directly calling a method on a property of its Repository field. Now please turn your attention to Figure 4, which shows another sample of the same code, but this time it is following the Law of Demeter.
    public class Repository2 {
      private DataService _service;
    
      public Repository2(DataService service) {
        _service = service;
      }
    
      public InsuranceClaim[] FindClaims(Customer customer) {
        // we're simply going to delegate to the inner
        // DataService for now, but who knows what 
        // we want to do in the future?
        return _service.FindClaims(customer);
      }
    }
    
    public class ClassThatNeedsInsuranceClaim2 {
      private Repository2 _repository;
    
      public ClassThatNeedsInsuranceClaim2(
        Repository2 repository) {
        _repository = repository;
      }
    
      public void TallyAllTheOutstandingClaims(
        Customer customer) {
        // This line of code now follows the Law of Demeter
        InsuranceClaim[] claims = _repository.FindClaims(customer);
      }
    }
    What did we accomplish? Repository2 is easier to use than Repository because you have a direct method to call for the InsuranceClaim information. When we were violating the Law of Demeter, the consumers of Repository were tightly coupled to the implementation of Repository. With the revised code, I have more ability to change the Repository implementation to add more caching or swap out the underlying DataService with something completely different.
    The Law of Demeter is a powerful tool to help you spot potential coupling problems, but don't follow the Law of Demeter blindly. Violating the Law of Demeter does add tighter coupling to your system, but in some cases you may judge that the potential cost of coupling to a stable element of your code is less expensive than writing a lot of delegation code to eliminate the Law of Demeter violations.

    Tell, Don't Ask
    The Tell, Don't Ask design principle urges you to tell objects what to do. What you don't want to do is ask an object about its internal state, make some decisions about that state, then tell that object what to do. Following the Tell, Don't Ask style of object interaction is a good way to ensure that responsibilities are put in the right places.
    Figure 5 illustrates a violation of Tell, Don't Ask. The code is taking a purchase of some sort, checking for the possibility of a discount for purchases of more than $10,000, and finally examining the account data to decide whether there are sufficient funds. The previous DumbPurchase and DumbAccount classes are, well, dumb. The account and purchase business rules are both encoded in ClassThatUsesDumbEntities.
    public class DumbPurchase {
      public double SubTotal { get; set; }
      public double Discount { get; set; }
      public double Total { get; set; }
    }
    
    public class DumbAccount {
      public double Balance { get; set;}
    }
    
    public class ClassThatUsesDumbEntities {
      public void MakePurchase(
        DumbPurchase purchase, DumbAccount account) {
        purchase.Discount = purchase.SubTotal > 10000 ? .10 : 0;
        purchase.Total = 
          purchase.SubTotal*(1 - purchase.Discount);
    
        if (purchase.Total < account.Balance) {
          account.Balance -= purchase.Total;
        }
        else {
          rejectPurchase(purchase, 
            "You don't have enough money.");
        }
      }
    } 
    This type of code can be problematic in a couple of ways. In a system like this, you potentially have duplication because the business rules for an entity are scattered in procedural code outside of the entities. You might duplicate logic unknowingly because it's not obvious where the previously written business logic would be.
    Figure 6 shows the same code, but this time following the Tell, Don't Ask pattern. In this code, I moved the business rules for purchasing and accounts into the Purchase and Account classes themselves. When we go to make a purchase, we just tell the Account class to deduct the purchase from itself. Account and Purchase know about themselves and their internal rules. All a consumer of Account has to know is to call the Account.Deduct(Purchase, PurchaseMessenger) method.
    public class Purchase {
      private readonly double _subTotal;
    
      public Purchase(double subTotal) {
        _subTotal = subTotal;
      }
    
      public double Total {
        get {
          double discount = _subTotal > 10000 ? .10 : 0;
          return _subTotal*(1 - discount);
        }
      }
    }
    
    public class Account {
      private double _balance;
    
      public void Deduct(
        Purchase purchase, PurchaseMessenger messenger) {
        if (purchase.Total < _balance) {
          _balance -= purchase.Total;
        }
        else {
          messenger.RejectPurchase(purchase, this);
        }
      }
    }
    
    public class ClassThatObeysTellDontAsk {
      public void MakePurchase(
        Purchase purchase, Account account) {
        PurchaseMessenger messenger = new PurchaseMessenger();
        account.Deduct(purchase, messenger);
      }
    }
    The Account and Purchase objects are easier to use because you don't have to know quite so much about these classes to execute our business logic. We've also potentially reduced duplication in the system. The business rules for Accounts and Purchases can be effortlessly reused throughout the system because those rules are inside the Account and Purchase classes instead of being buried inside the code that uses those classes. In addition, it should be easier to change the business rules for Purchases and Accounts as those rules are found in just one place in the system.
    Closely related to Tell, Don't Ask is the Information Expert pattern. If you have a new responsibility for your system, in what class should the new responsibility go? The Information Expert pattern asks, who knows the information necessary to fulfill this responsibility? In other words, your first candidate for any new responsibility is the class that already has the data fields affected by that responsibility. In the purchasing sample, the Purchase class knows the information about a purchase that you would use to determine any possible discount rates, so the Purchase class itself is the immediate candidate for calculating discount rates.
     

No comments: