Archive for the ‘guidelines’ Category
How does the DRY Principal relate to a heavily layered architecture?
Today I’ve been thinking about how the DRY Principal (Do not Repeat Yourself) applies to an SOA or heavily layered architecture. A colleague of mine pointed out that our architecture violates this principal quite a lot, we didn’t really get the chance to dig into it, but I’ve been thinking about it quite a lot since.
The scenario that we’re dealing with is a shared database (shared with legacy apps). A shared back-end WCF service (split into many services, each with a service, business and data access layer). And several client application. With an architecture like this I think it’s very easy to fall into duplication, but it’s also worth differentiating between logic and definition.
As far as logic goes I think we’re pretty good at DRY; any real business logic goes in the WCF services and the clients are (in theory) pretty thin. When it comes to definitions I think we’re repeating ourselves a lot, but I’m not sure if this is necessarily a bad thing, because as soon as two areas of code share the same definition of something they’re both bound to that definition and therefore to each other (at least in a small way).
So let’s take an example of a definition of an aircraft that’s ‘repeated’ in several places, and in each place see if we can justify ‘repeating’ the definition:
- Database – The aircraft exists in the database, let’s take this as our initial declaration
- Data access – In the WCF service the aircraft exists in a dataset xsd – this acts as the mapping definition between the database and the business entities and is pretty much autogenerated anyway.
- Business entity – In the WCF service the aircraft is represented as a POCO business entity, this is seperate from the data access so that the business logic is decoupled from the pesistence technology. It’s also a much richer representation of the data that can be manipulated in code.
- Translator – Although not strictly it’s own definition a translator class translates between the business entity and the DTO, and ends up having to ‘know’ about the definition of an aircraft
- DTO – At the service layer the aircraft is represented as one or more data contracts. This is the contract that the WCF service exposes, therefore it’s a good idea to seperate it from the business entity so that the internals of the service can change without necessarily modifying the contract. Also a DTO is just for transferring the data, whereas the business entity can be a much richer representation.
- Client business entity – Some of the clients convert the DTO into a locally defined business entity, this is because some of the clients have quite rich behaviour themselves and decoupling from the DTOs (that are shared with the service and other clients) is advantageous.
- Client side translator – again a translator is needed to convert between the client side business entity and the DTO.
- View – Sometimes the UI needs a slightly different representation of an aircraft to display slightly different things, this occasionally leads to one last definition of aircraft.
Looking at the list it does initially look like we’ve repeated the definition a few too many times, and I’d probably concede that there are a few too many definitions; however none of the definitions are a straight forward repeat, each definition is there to provide decoupling, and is usually slightly different from the other definitions, even in cases where the definition is the same this is just ‘coincidence’ – as requirements change the different layers can diverge in what they think an aicraft looks like.
There are some places where I can see potential for reducing duplication, I think the two translators could be replaced with an automapper. The dataset could be replaced with a proper ORM, and possibly one that used convention over configuration.
So in summary I think that the DRY Principal is ‘good’ but that it needs to actually be thought about, and there are times when you want to ‘repeat’ yourself for another reason such as achieving decoupling.
Branching build scripts for production branches
At work we recently switched to using Team Foundation Server with a ‘proper’ branch/merge strategy. We also have improved our build process so that although it’s still a partly manual process, a document leads you through the process, and the process is repeatable as all the build scripts are source controlled and each build starts from a know starting point (VM with pre-reqs and build software).
Until very recently the builds were always run against ‘Main’, but we’ve now got a production branch, and I’ve been asked to modify the build scripts to get the source from the production branch rather than Main. To do this properly I think I need to do the following:
- In the production branch modify the source controlled build scripts to point to the correct source location.
- Create a copy of the build instructions specifically for this branch that say where to get the build scripts from (production branch)
To allow builds from Main and any Production branches both the document and the build scripts need to be branched.
Supporting plural and singular parameters
This problem relates to all APIs I would say, but I’ll give an example. You have a service layer, a business layer, and a data access layer. An operation on the service layer for getting instances of a class by Ids can either support passing in a single Id, or passing in plural Ids (in an array, IList, ICollection etc). The same choice is also available at the business layer, and again at the data access layer. So what do you choose at each layer?
If you choose just the singular then at some point someone is going to want to query for multiple Ids, and instead of creating a plural version of the method they may just call the singular version multiple times (very chatty).
If you treat all invocations as plural operations then the client is forced to create a plural parameter even though they may only have one value.
If you provide both at the service layer then should you have 2 versions of the operation all the way down the stack? Which will almost certainly lead to duplication, or do you convert all calls to multiple singular stacks, or a single plural stack? If you treat all calls as either option then you end up either missing performance improvements available by batch processing, or you add the complication of batch processing even when 100% of the calls are actually singular.
I’m not sure there is a ‘correct’ answer.
- I think you should only create the operation at the service layer when it is needed, so if you only need singular operation start with this. As soon as someone needs the batch version then the additional method signature should be created.
- If you do need both I think I would be tempted to only have both versions at the service layer (where we want to reduce network traffic). At lower levels I would begin by treating batch calls as multiple calls to the singular stack. The justification for this is that singular versions are almost always more simple, and that means more to me than some premature optimization. As soon as there is a performance bottle neck treating batch calls this way I would look at either having 2 routes down the stack or treating all as batch calls, with the added complexity of the batch operations having to consider the singular case.
Code review, before or after check-in?
Recently I’ve been thinking about our code review process, and how it could be improved. We’ve recently moved from a simple VSS based source control to a more feature rich TFS based approach.
We now have the majority of feature development occurring in several development branches, at the end of each development iteration we get all the changes reviewed and then merge the changes into our Main (or trunk); a build from Main will then be passed to test.
Problems we’re finding with this approach are:
Development Code Freeze
Whilst code is being reviewed (and review actions being actioned) we pretty much freeze feature development in the branch, the code review can often take a long time, so people often start doing further development off-line, ready to check-in once the freeze is lifted.
How can we make sure that what is eventually merged into Main is what was reviewed?
If the review is done against the changes between Dev and Main then you miss reviewing any merge conflict resolutions (unless you do a 2nd review). If you review the ‘pending merge’ (where someone does all the merge process apart from the final check-in to Main), you risk the merge no longer being valid (someone else checks into Main between pending merge and review complete).
Seeing very little benefit from the review
Our process of reviewing all code going into Main is giving us a false sense of confidence, as we think Main has top-quality code in it; in fact this is not the case, often it seems the only review comments that are actioned are fairly trivial changes that wont take very long. Any fairly major changes are often deferred to a future ‘Change Request’, with the merge going ahead anyway, as the feature is required.
We do want to prevent breaking changes going into production or even to test, but this should probably be caught by testing (automated) and Continuous Integration.
Code review is inconsistent
Code reviews are processed in a fairly inconsistent out-of-band way, with very little formal process around it. Review comments can sometimes be lost, and tracking whether the review was fully actioned etc is not 100% possible.
Suggestion: Do the code review after the check-in to main
So the suggestion I’ve been making is that instead of holding up the check-in to main with the code review we do the code review afterwards. The process would be something like this:
- Feature development is done in a development branch
- Towards the end of the iteration the code is prepared for check-in to main
- The code is merged into main
- The check-in to main causes a code review task to be created for the changeset
- Code review actions are done directly against main.
- Once the code review has all been resolved (actioned, further tasks raised etc). It is marked as closed.
- Before main is pushed to production all feature development code reviews must have been completed.
- Development Freeze – by decoupling review from development the two activities could run in parallel, as soon as the merge is complete development could resume. If merging up to a label you wouldn’t even have to freeze whilst the merge is done.
- Review exactly what was merged – by definition we would be reviewing what was merged. We could even say that review actions should be reviewed so no changes to main go unreviewed.
- Review benefit – by decoupling review from development and making the review process more traceable I think we will be able to add review actions to the product back-log and then even the big changes will be done. Review actions should be treated in a very similar way as feature development with priority dictating what is actually done.
- Review inconsistency – having a proper review process that is tracked with tasks will means a more consistent approach and a reliable process. Any unresolved review comments will be searchable/reportable just like any other tasks.
- I think we would risk checking in bad code to main, and although we could use our source control system to roll these back etc, we might not catch them in time – I don’t think, however, that our current process really stops this happening.
- Reviewing could be neglected in favour of feature development, but this just means that project management needs to see the importance of the review process and prioritise the tasks appropriately.
Accepted Best Practice
Before writing this post I tried to search for the best practice in this area and couldn’t really find any. I guess one answer is that instead of code reviewing paired programming could be more effective. Does any one have any suggestions as to what others are doing?
Defensive coding, interfaces and code noise.
Today I’ve been trying to balance several factors in my code following a code review. The first 2 are defensive coding and code noise; some people suggest that wherever an object is used it should first be checked for null, but then what do you do if it is null? sometimes the structure of your code ensures that it is never null. Also having all the null checks just introduces a lot of code noise. One solution would be to have non-nullable types (yes please!), but until that is possible in C# I need some guidelines as to when to check for null.
If an object is guaranteed to be non-null do not bother adding a check for null.
If an object is returned from a method on an interface then the return should always be checked for null.
If an object is returned from a method on a concrete class then the return should be checked for null unless the class has some sort of a guarantee that a null value will never be returned.
These last 2 guidelines show a difference between using an interface and a concrete instance. When you use an interface you can’t guarantee the behaviour of the implementing class, and due to the increased flexibility it would be a bad idea to make these assumptions based on the current implementation (particularly when the implementation is injected or similar). This makes me wonder whether I should spend more time balancing the use of interface dependencies and concrete dependencies.
Any comments welcome; what guidelines do you use for checking for nulls?
Leave a Comment
Leave a Comment
Leave a Comment
Why do our code reviews take such a long time?
Filed under: agile, comments, guidelines |
Kevin E. Schlabach has posted in response to my post on code reviewing. I almost decided to respond on my own blog, but thought it was more polite to comment there and just link to his post (and my reply) from here. The main point it made me think about is trying to decide why our code reviews take a long time. I’d be interested in other’s approaches to speeding up code reviews.