Much of software design
involves the ongoing question, where should this code go?
In particular, I would like to achieve three specific things with my code structure:
- Keep things that have to change together as close together in the code as possible.
- Allow unrelated things in the code to change independently (also known as orthogonality).
- Minimize duplication in the code.
To
achieve these three goals, I need some tools to help me know where new
code should go and some other tools to help me recognize when I've put
code in the wrong place.
By
and large, these goals are closely related to the classic code
qualities of cohesion and coupling. I achieve these goals by moving
toward higher cohesion and looser coupling. Of course, first we need to
understand what these qualities mean and why coupling and cohesion are
helpful concepts. Then I'd like to discuss some of what I'm going to
call "design vectors" that help us move toward better structures and
recognize when we need to move away from bad structures that have crept
into our code.
Just
as a note, I'm the primary developer of an open source inversion of
control (IOC) tool called StructureMap. I'm going to be using some
real-life examples from StructureMap to illustrate the design problems
that these vectors address. In other words, don't make the same mistakes
that I made.
Decrease Coupling
You
can't help but hear the terms "loose coupling" or "tight coupling" in
almost any discussion on software design. Coupling among classes or
subsystems is a measure of how interconnected those classes or
subsystems are. Tight coupling means that related classes have to know
internal details of each other, changes ripple through the system, and
the system is potentially harder to understand. Figure 1
shows a contrived sample of a business-processing module that is very
tightly coupled to various other concerns besides the business logic.
Figure 1 Tightly Coupled Code
public class BusinessLogicClass { public void DoSomething() { // Go get some configuration int threshold = int.Parse(ConfigurationManager.AppSettings["threshold"]); string connectionString = ConfigurationManager.AppSettings["connectionString"]; string sql = @"select * from things size > "; sql += threshold; using (SqlConnection connection = new SqlConnection(connectionString)) { connection.Open(); SqlCommand command = new SqlCommand(sql, connection); using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { string name = reader["Name"].ToString(); string destination = reader["destination"].ToString(); // do some business logic in here doSomeBusinessLogic(name, destination, connection); } } } } }
Let's
say that what we really care about is the actual business processing,
but our business logic code is intertwined with data-access concerns and
configuration settings. So what's potentially wrong with this style of
code?
The
first problem is that the code is somewhat hard to understand because
of the way the different concerns are intertwined. I'll discuss this
further in the next section on cohesion.
The
second problem is that any changes in data-access strategy, database
structure, or configuration strategies will ripple through the business
logic code as well because it's all in one code file. This business
logic knows too much about the underlying infrastructure.
Third,
we can't reuse the business logic code independent of the specific
database structure or without the existence of the AppSettings keys. We
also can't reuse the data-access functionality embedded in the
BusinessLogicClass. That coupling between data access and business logic
might not be an issue, but what if we want to repurpose this business
logic for usage against data entered directly into an Excel spreadsheet
by analysts? What if we want to test or debug the business logic by
itself? We can't do any of that because the business logic is tightly
coupled to the data-access code. The business logic would be a lot
easier to change if we could isolate it from the other concerns.
To summarize, the goals behind achieving loose coupling between classes and modules are to:
- Make the code easier to read.
- Make our classes easier to consume by other developers by hiding the ugly inner workings of our classes behind well-designed APIs.
- Isolate potential changes to a small area of code.
- Reuse classes in completely new contexts.
Increase CohesionThe academic definition of cohesion is that it is a measure of how closely related all the responsibilities, data, and methods of a class are to each other. I like to think of cohesion as a measure of whether a class has a well-defined role within the system. We generally consider high cohesion to be a good thing and repeat the words "highly cohesive" like a mantra. But why?Let's think of coding as having a conversation with the computer. More accurately, we're having several simultaneous conversations with the computer. We're having conversations about how security is carried out, how the infrastructure concerns are supposed to behave, and what the business rules are.When you're at a loud party with a lot of different conversations going on at once, it can be difficult to focus on the one conversation you're trying to have. It's much easier to have a conversation in a quiet setting where there's only one conversation going on.An easy test for cohesion is to look at a class and decide whether all the contents of the class are directly related to and described by the name of the class—vague class names, such as InvoiceManager, do not count. If the class has responsibilities that don't relate to its name, those responsibilities probably belong to a different class. If you find a subset of methods and fields that could easily be grouped separately under another class name, then maybe you should extract these methods and fields to a new class.To illustrate, if you find methods and data in your TrainStation, Train, and Conductor classes that seem to most accurately fit the theme of the TrainSchedule class, move those methods and data into the TrainSchedule. One of my favorite sayings about design is applicable here: put code where you'd expect to find it. It's most logical to me that functionality involving a train schedule would be in the TrainSchedule class.To stretch my earlier conversation analogy, a system consisting of cohesive classes and subsystems is like a well-designed online discussion group. Each area in the online group is narrowly focused on one specific topic so the discussion is easy to follow, and if you're looking for a dialog on a certain subject, there's only one room you have to visit.
Eliminate Inappropriate IntimacyInappropriate intimacy refers to a method in a class that has too much intimate knowledge of another class. Inappropriate intimacy is a sign of harmful, tight coupling between classes. Let's say that we have a business logic class that calls an instance of class DataServer1 to get the data it needs for its business-logic processing. Figure 2 shows an example. In this case, the Process method has to know a lot of the inner workings of DataServer1 and a bit about the SqlDataReader class.Figure 2 Inappropriate Intimacypublic void Process() { string connectionString = getConnectionString(); SqlConnection connection = new SqlConnection(connectionString); DataServer1 server = new DataServer1(connection); int daysOld = 5; using (SqlDataReader reader = server.GetWorkItemData(daysOld)) { while (reader.Read()) { string name = reader.GetString(0); string location = reader.GetString(1); processItem(name, location); } } }
Now, let's rewrite the code in Figure 2 to remove the inappropriate intimacy:public void Process() { DataServer2 server = new DataServer2(); foreach (DataItem item in server.GetWorkItemData(5)) { processItem(item); } }
As you can see in this version of the code, I've encapsulated all the SqlConnection and SqlDataReader object manipulation inside the DataServer2 class. DataServer2 is also assumed to be taking care of its own configuration, so the new Process method doesn't have to know anything about setting up DataServer2. The DataItem objects returned from GetWorkItemData are also strongly typed objects.Now, let's analyze the two versions of Process method against some of the goals of loose coupling. First, what about making the code easier to read? The first and second versions of Process perform the same basic task, but which is easier to read and understand? Personally, I can more easily read and understand the business logic processing without wading through data-access code.How about making our classes easier to consume? Consumers of DataServer1 have to know how to create a SqlConnection object, understand the structure of the DataReader returned, and iterate through and clean up the DataReader. A consumer of DataServer2 only has to call a no-argument constructor and then call a single method that returns an array of strongly typed objects. DataServer2 is taking care of its own ADO.NET connection setup and cleaning up open DataReaders.And as for isolating potential changes to a small area of code? In the first version of the code, almost any change in the way DataServer works would impact the Process method. In the second, more encapsulated version of DataServer, you could switch the data store to an Oracle database or to an XML file without any effect on the Process method.
The Law of DemeterThe Law of Demeter is a design rule of thumb. The terse definition of the law is: only talk to your immediate friends. The Law of Demeter is a warning about the potential dangers of code, as in Figure 3.Figure 3 Breaking the Lawpublic interface DataService { InsuranceClaim[] FindClaims(Customer customer); } public class Repository { public DataService InnerService { get; set; } } public class ClassThatNeedsInsuranceClaim { private Repository _repository; public ClassThatNeedsInsuranceClaim(Repository repository) { _repository = repository; } public void TallyAllTheOutstandingClaims(Customer customer) { // This line of code violates the Law of Demeter InsuranceClaim[] claims = _repository.InnerService.FindClaims(customer); } }
The ClassThatNeedsInsuranceClaim class needs to get InsuranceClaim data. It has a reference to the Repository class, which itself has a DataService object. ClassThatNeedsInsuranceClaim reaches inside Repository to grab the inner DataService object, then calls Repository.FindClaims to get its data. Note that the call to _repository.InnerService.FindClaims(customer) is a clear violation of the Law of Demeter because ClassThatNeedsInsuranceClaim is directly calling a method on a property of its Repository field. Now please turn your attention to Figure 4, which shows another sample of the same code, but this time it is following the Law of Demeter.Figure 4 Better Decouplingpublic class Repository2 { private DataService _service; public Repository2(DataService service) { _service = service; } public InsuranceClaim[] FindClaims(Customer customer) { // we're simply going to delegate to the inner // DataService for now, but who knows what // we want to do in the future? return _service.FindClaims(customer); } } public class ClassThatNeedsInsuranceClaim2 { private Repository2 _repository; public ClassThatNeedsInsuranceClaim2( Repository2 repository) { _repository = repository; } public void TallyAllTheOutstandingClaims( Customer customer) { // This line of code now follows the Law of Demeter InsuranceClaim[] claims = _repository.FindClaims(customer); } }
What did we accomplish? Repository2 is easier to use than Repository because you have a direct method to call for the InsuranceClaim information. When we were violating the Law of Demeter, the consumers of Repository were tightly coupled to the implementation of Repository. With the revised code, I have more ability to change the Repository implementation to add more caching or swap out the underlying DataService with something completely different.The Law of Demeter is a powerful tool to help you spot potential coupling problems, but don't follow the Law of Demeter blindly. Violating the Law of Demeter does add tighter coupling to your system, but in some cases you may judge that the potential cost of coupling to a stable element of your code is less expensive than writing a lot of delegation code to eliminate the Law of Demeter violations.
Tell, Don't AskThe Tell, Don't Ask design principle urges you to tell objects what to do. What you don't want to do is ask an object about its internal state, make some decisions about that state, then tell that object what to do. Following the Tell, Don't Ask style of object interaction is a good way to ensure that responsibilities are put in the right places.Figure 5 illustrates a violation of Tell, Don't Ask. The code is taking a purchase of some sort, checking for the possibility of a discount for purchases of more than $10,000, and finally examining the account data to decide whether there are sufficient funds. The previous DumbPurchase and DumbAccount classes are, well, dumb. The account and purchase business rules are both encoded in ClassThatUsesDumbEntities.Figure 5 Asking Too Muchpublic class DumbPurchase { public double SubTotal { get; set; } public double Discount { get; set; } public double Total { get; set; } } public class DumbAccount { public double Balance { get; set;} } public class ClassThatUsesDumbEntities { public void MakePurchase( DumbPurchase purchase, DumbAccount account) { purchase.Discount = purchase.SubTotal > 10000 ? .10 : 0; purchase.Total = purchase.SubTotal*(1 - purchase.Discount); if (purchase.Total < account.Balance) { account.Balance -= purchase.Total; } else { rejectPurchase(purchase, "You don't have enough money."); } } }
This type of code can be problematic in a couple of ways. In a system like this, you potentially have duplication because the business rules for an entity are scattered in procedural code outside of the entities. You might duplicate logic unknowingly because it's not obvious where the previously written business logic would be.Figure 6 shows the same code, but this time following the Tell, Don't Ask pattern. In this code, I moved the business rules for purchasing and accounts into the Purchase and Account classes themselves. When we go to make a purchase, we just tell the Account class to deduct the purchase from itself. Account and Purchase know about themselves and their internal rules. All a consumer of Account has to know is to call the Account.Deduct(Purchase, PurchaseMessenger) method.Figure 6 Telling Your App What to Dopublic class Purchase { private readonly double _subTotal; public Purchase(double subTotal) { _subTotal = subTotal; } public double Total { get { double discount = _subTotal > 10000 ? .10 : 0; return _subTotal*(1 - discount); } } } public class Account { private double _balance; public void Deduct( Purchase purchase, PurchaseMessenger messenger) { if (purchase.Total < _balance) { _balance -= purchase.Total; } else { messenger.RejectPurchase(purchase, this); } } } public class ClassThatObeysTellDontAsk { public void MakePurchase( Purchase purchase, Account account) { PurchaseMessenger messenger = new PurchaseMessenger(); account.Deduct(purchase, messenger); } }
The Account and Purchase objects are easier to use because you don't have to know quite so much about these classes to execute our business logic. We've also potentially reduced duplication in the system. The business rules for Accounts and Purchases can be effortlessly reused throughout the system because those rules are inside the Account and Purchase classes instead of being buried inside the code that uses those classes. In addition, it should be easier to change the business rules for Purchases and Accounts as those rules are found in just one place in the system.Closely related to Tell, Don't Ask is the Information Expert pattern. If you have a new responsibility for your system, in what class should the new responsibility go? The Information Expert pattern asks, who knows the information necessary to fulfill this responsibility? In other words, your first candidate for any new responsibility is the class that already has the data fields affected by that responsibility. In the purchasing sample, the Purchase class knows the information about a purchase that you would use to determine any possible discount rates, so the Purchase class itself is the immediate candidate for calculating discount rates.
No comments:
Post a Comment