Sunday, November 4, 2012

Integration challenge : Handling Duplicate messages

Challenge

Enterprise integrations often demand distributed message processing. Responsibilities are distributed among systems that handle messages at different processing stages. Integration often involved message queueing technologies that provide quality of services and decouples systems. As with all distributed technologies there are eight fallacies to it that need to taken care off. This could produce duplicate messages, or even requestor by mistake could send two messages with same information.  Take an example of order processing system that accepts orders from different channel applications in xml format. The order request is made up of orderItems, customer information, coupon discount reference. The system need to ensure the orders are not using the same coupon discount reference more than once. There could be various reasons why duplicate orders could be send to system. It could be network failure, system faults, customer entering it twice etc. What strategies do we have to handle this challenge?

Solution option

There are different strategies depending on the way messages are processed. Whether we have intgeration platform that provides support to handle duplicate message through configurations or one has simple applications servers running in clusters and handling messages in parallel.
The IT landscape we are in also makes a difference, do we have control over the producers of messages and can we impose restrictions on how they produce messages. Its good practice not to design with specific client as consumer of the services or capability. Idempotent consumer is an Integration pattern defined and used in many scenarios. The solution has three parts to it
  1. How to identify unique messages?
  2. Maintaining history of message uniqueness
  3. Transactional integrity of consumer.
There need to be a way to recognize the message identity from business sematic point of view. The business context to which the message belongs need to have clear identifier that says this is unique to message per client or across clients. It could be single field/element or combination of fields that may define the uniqueness.
Obviously the next step is to think about how to compare the current message with past messages to determine duplicity. Storing history  information could be in different mechanism. One could store uniqueness identifier in in-memory cache for limited time to improve performance. Other way could be to store identifier in database table with constraints of uniquness on those fields. This would naturally throw exception while insertion. There could be separate table with unique column that stores hashed value. The hash value is derived from the combination of business semantic message identifier. In the case of parallel consumers running in cluster, there could be issue of parallel processing of messages that are duplicate and if you have long transactions with steps like webservice calls that may not participate in global transactions, you may end up in rollback of second transaction but with the damage due to processing the webservice.
Hence, it does make sense to use a independent endpoint in the processing flow  that could filter duplicate messages and allows unique messages forward.
 
Can the broker infrastructure take this responsibility? It could maintain a realtime cache of messages currently processed and will have to compare current message from the cached ones to identify duplicate. Let the application simply check the application database for duplicates in normal query. Some modern messaging systems like HornetQ do give support.
 
Depending on the architecture style one has there are different ways to handle the duplicates. Having and ESB style will help to isolate this concern outside the application and within ESB infratructure. If one is in clustered application server environment like websphere, one could use MDBs as endpoints and queues to write Idempotent message consumer that filters duplicate messages.
 

Saturday, October 20, 2012

Global Delivery and life-span of IT Applications

Over the years the concept of applications has changed, especially with the introduction of new models; SOA, SaaS and ESB. Application is a general term used to represent collection of modules, bundled and packaged as a deployment unit. So, there are applications for different domains like the Integration, Web Portals, Messaging, Batch, ETL, etc. Here we use the term more for business applications. 

What is more challenging is that most applications bleed with problems as they grow with more funtionality. Customers demand to add functionality in short timeframe, scale the application to meet volume growth, expect to move with multi-tenancy models, the demand for change is ever-constant. This becomes the key puzzle to solve for delivery teams. Apart from these, the pressure of meeting business timelines, the unclear requirements, environment readiness, skill deficiencies, and most importantly the leadership challenge is pre-dominant in the offshore-onshore defined global delivery model.

There are many theories on how to approach these challenges, with agile methodologies, global delivery models, improved project management, the focus has been towards using different project management techniques. Over the years these techniques have been adopted without much success towards satisfaction of customer. What could be the reasons for failures? Do we really openly talk of failures? How can we approach differently? Does metrics based delivery resolve the issues? Does only agile resolve the issues? I have my strong views on this topic and will be sharing them here..

I believe the life of application is directly propotional to its design integrity. The most important design principle is Modularity. Think of it as Lego block puzzle. Joining the lego blocks in controlled way will create better structures. Be it creating Service in SOA or building web applications or any enterprise applications, if the design is based on modular blocks and their dependencies are controlled efficiently, the application becomes maintainable, understandable and flexible. This gives long life to application.

What is then stopping us to build such applications? I see many IT vendors take greenfield projects offshore working in global delivery model and even more systems for maintenance, but many cases end up with fragile systems that eventually bleed customers time and money. At later stage customers start terming these applications as legacy and start looking for replacement products and loose faith in custom developed applications. So what does it take to improve from this current state? Root cause to this problem lies in in-discipline to design. More important is to have consistent mechanism to ensure design integrity is maintained. How does IT vendors ensure this? Does use of bigger IT team, bigger test teams, bigger project management teams help resolve this? No way!

It is important to realize that design and coding skills are more important than ever before. IT vendors need to invest in maintaining right skilled people to ensure quality is delivered. It is skills in design and development that will ensure change to current system is applied meaningfully and maintains design integrity. People with design skills know how to apply design principles, know how implementation is modularized, how extensibility is supported and how application design integrity is maintained. This is why Object oriented principles, Service oriented principles and Agile methods are more relevant today. Applying these principles will ensure evolution of modular applications over longer period of time, benefiting customer in terms of change, efficiency and quality.
IT vendors need to focus on quality along with efficiency to provide customer satisfaction in a global delivery model. Key to this is people with right skills and that is the real challenge to the IT vendors, especially as they grow big their differentiation factors are diminishing. It is time to take a step back for IT vendors to re-focus on core capabilities and make it real competitive advantage in the crowded outsourcing milieu.

Saturday, September 29, 2012

Five execution strategies for outsourced product development

In the world where outsourcing is the norm, the percent of business application  outsourced for custom development, maintenance and support is approximately more than 70% in my view. This is quite a huge change in the last decade or so. With all the positive impacts it brings with cost reduction, flexibility, reduce time-to-market, it has also given birth to challenges in quality of deliveries, communication gaps, overheads etc.

The most important challenge in outsourced custom application development type of engagement is  to "Manage Design & Architecture" of applications with respect to completeness, aligning with principles & guidelines, value realization, and ensuring robust application in production.  How do teams working Onshore-Offshore models ensure the design is well built, architecture is well communicated, quality is ensured, and discipline is maintained? How do teams organize to ensure Architecture & Design function is well represented and given due weight-age? Why do customer's see delay's in delivery for small changes? These are some questions that force customers and their vendor partners to re-look at their execution model.

From the perspective of software methodologies followed, every customer has their own customized processes that are followed by vendors with minor changes. Many still follow the standard water-fall kind of model or some hybrid model to suite their working culture. There has been noticeable rise in adoption of Agile methods by organizations, they are yet to be in mainstreams in outsourcing model. Here I would  focus on strategies that are proven to improve outsourced execution capabilities, that would result in successful deliveries for outsourced vendors.

  1. Get the Skill mix right: For custom development engagements, having the right skill-mix is very important to ensure your start with right foot forward. Always ensure you have engaged Solution/Application Architect upfront at the start to benefit in the long run. It is often noticed that Business Analyst are performing the role of Solution Architects due to unavailability of people. It is this role that is key to map requirements to implementation making sure we align to business needs.It is recommended for complex engagements to have Solution Architect working close to business, while Technology Specialist working with Offshore teams.
  2. Get the Method right: By Method I mean method of delivery for Architecture & Design process. Architect need to use innovative ways to represents and communicate design to the team, business and other stakeholders. Architect need to take lead in modelling design decisions, technology selection decisions, highlighting risks, challenging business on the requirements if needed. All needs to be documented and shared to stakeholders. If one follows the agile approach the design is part of the implementation, the validation happens with implementation, but it also needs agile modelling to help communicate design approach and decisions to remote teams. 
  3. Get the Discipline in the team: At the end of it all, it is people that make the team, their collaborative and synergy is going to bring the success to the engagement. Hence it is very important to build discipline. This needs a leader, coach, mentor and more importantly passionate person to imbibe good practices to his team. Introduce good practices for requirement management, source code management, build management, collaborative design and prototyping, release management, and automated testing. It requires discipline and motivation in the team to do successful execution. Discipline brings common focus and build synergy. But it is the most difficult job as a leader.
  4. Get to the Hybrid-Agile approach: Agile approaches have begun to be accepted by outsourcing vendors. Adopting pure agile is difficult when customer and vendor are not ready for it. Often the customer team is not ready for agile approach, their processes are too rigid and not ready to take the plunge. Many customers are in this situation currently, but to improve value of outsourcing it is recommended to adopt a hybrid-agile approach. This means having shorter cycles of executions called iterations, requirements in concise documentation, automated test and build process, regular customer feed-backs on deliveries, complete visibility of effort spent. Often the customer teams are rigid and not geared to work in agile model. This has to change to persuade vendors to work more with agile practices that results in better value-realization in outsourcing.
  5. Get the Hands-On Architects: Role of Architects need to change to include more hands-on approach with the teams. This is very important in the starting phased where the design is shaping-up and decisions are made for long-term outlook. The average-experience of teams by outsourced vendors is relatively less, it puts much pressure to give a good start in the starting phases. Having hands-on Architect in very important in critical projects. Architect has to define and develop interfaces and modularize the important architectural components, design framework fundamentals and guidelines, establish strategies for important requirements and cross-cutting concerns. This reduces pressure on developers who can focus on implementations to meet functionality.
 In my experience working as Architect for vendors, I find above strategies very effective. Value realization in outsourcing would be high when the quality of delivery is good with respect to implementations. Most vendors try to maximize efforts, focus to improve good perception of engagement, lead customers team to be in comfortable zone, but take a relaxed approach to quality of deliveries. It is for customers to build framework for performance evaluation to keep vendors on toes. Adopting above practices will benefit both customers and vendors to realize win-win relationship. At the same time vendors need to respect people skills and adopt people centric policies to have consistent results. Eventually these strategies have potential to be strategic competitive advantage.

I do invite comments on the topic..

Sunday, September 23, 2012

Java Framework for Batch Applications

Batch applications are most common in the enterprises today. They seem to have a role even in the modern enteprise landscape, probably because it represents real-life nature of work we do in organizations. For e.g. we do end-of-day processing of checques, scheduled processing of orders, settlement of bills in hotels done at the end of day. Hence the relevance of batch frameworks will remain in advanced landscape too. Typically looking at the enterprise milieu, there are many scenarios that demand once a day processing or even less frequency of processing in a day. For e.g.
1. Report generation for management information,
2. Database synchronization at the end of the day,
3. Extraction of report information for industry feed etc.
 May be the way it is applied can change. Many legacy systems and mainframes process information in a batch model to handle work-loads efficiently.

In enteprises, there could be different mechanism to handle batch processing. Typically ETL tools are using in a sourc-to-target information exchange that includes translation and data-quality needs. These tools provide out-of-box features to manage transformation needs. But, there are also simple programs that are scheduled to run at specific intervals to handle batch processing needed. For e.g.
1. Standalone Java programs scheduled on Unix with Autosys or Unix scheduler
2. Standalone Unix shell scripts that are run on the Unix server periodically.
3. Database stored procedures that produces flat files for report generation. They could be efficient in data-synchronization, being close to the target data store.

Lets look at the important capabilities needed in a batch applications:
1. Ability to process large files within available window time,
2. Ability to handle files or information in different formats,
3. Process information in parallel and multi-threaded models.
4. Be flexible and extendible to accomodate customized transfromation, persistence and conversion logic,
5. Persist meta-data to enable auditing, monitoring and management.
6. Ablility to provide sync-point to enable restart
7. Handle data errors and exceptions gracefully,
9. Support remote monitoring, administration and manageability.
10. Support testing of batch models in modular fashion.

Developing a custom batch program to the needs of the project is always the option, but having a consistent model and framework to build batch programs would benefit the organization immensely in the long term. Well, open-source frameworks are maturing today to fill this space with the enterprise. One among them is Spring Batch 2.x framework that is matured enough to be used in enterprise landscape.

Spring batch framework is excellent framework for batch processing framework for system. It provides excellent abstractions to represent batch model of processing in a easy, understandable, modular and reusable fashion. With its latest version Spring Batch 2.x, it has built good capabilities that makes it top ten open source frameworks that are worth using in the enterprise landscape.

Following are positive features to adopt Spring batch framework:
1. It clearly defines separate layers: Infrastructure, Batch Core and Application, with abstractions that are extensible, reusable and testable.
2. Being part of Spring family, it maintains all benefits of using Spring with extended capabilities
3. Provides support for batch state and meta-data persistence that helps in monitoring and restartability.
4. Out-of-box support for variety of ItemReaders, ItemWriters and Tasklets
5. Support for different scalability models : Single process - multithreaded, Partition - Multiprocess - remote processing,
6. Well defined xml based domain language for batch applications that support dependency injection, customization and reuse.
7. Batch data model that supports meta-data persistence and monitoring.
8. Restartability, ability to restart processing from the next data-item from the source. This adds to the resilience of the solution, the much desired features.
9. Variety of job launching options, Command line program launch, programatic launch on message arrival or event.
10. Monitoring and Management of batch jobs for job operator. This supports realtime monitoring.

Spring batch projects applied in following scenarios:

1. Scheduled report generations: Producing end-of-day management reports on a operational data store. Integration with report engine and source data store.
2. Reference data update : Fetching and updating reference data on reference store periodically. XML being data exchange format, processing xml elements as data items.
3. Partitioned event based processing: Batch jobs launched on arrival of messages. In a partitioned and staged design, processing is partioned in stages. This provides resilience and restartability while processing large files.

There are few cases where one needs to extend the support provided in the framework, it is very flexible, being Pojo based, leveraging capabilities from Spring family with AOP, Config, Integration and Context management, it is powerful framework and has wide applicability.

We could see following benefits in using this framework:

1. Reduce risk for the project, due to use of right framework for the job, it  reduces chances of going the wrong path.
2. Reduce time to delivery, due to support at Infrastructure and batch domain abstraction, providing out-of-box reuse of capabilities
3. Large active community base, supports product adoption, enhancement and shared learning.
4. A much needed standard model for batch processing within enterprise that helps reuse, reduced effort to maintain, extensible and provides longer life to the application.
5. Being open-source, reduces cost of adoption, no licensing cost.

Some of the features that could be added to this framework are:

1. Distributed and remote monitoring and management of Batch applications in centralized fashion.
2. Enhanced support for execution environments -  Hadoop and SEDA model,
3. Flexibility in applying multi-threaded processing within a Step.
4. Remote Administration.

It has wide applications provided we leverage its abstraction model and comply with rules of usage with respect to scalability and persistence. Following resources are useful for readers:

Batch processing strategies
Scaling with Grid

Anti-pattern - Using ESB or Integration Broker as Application platform

SOA style of architecture is being adopted gradually by the industry. Organizations have started to view their business capabilities as set of well managed Services that are modular and reusable in different business processes. Business is slowly realizing the value of SOA with respect to Reuse, Business-IT alignment, and Agility.

Product vendors have responded swiftly by introducing SOA enabling products that help expedite the adoption process. There are many ESB & Broker products introduced that are expected to introduce loose coupling by placing intermediary that handles integration concerns between business applications.

Some of the commercial products are:
 1. IBM WebSphere Message Broker, Websphere ESB, WebSphere MQ.
 2. TIBCO Businessworks,
 3. Webmethods Broker, Integration Server,
 4. Oracle Service Bus
5. Sonic ESB, etc

Apart from the commercial products there are open source products and lightweight frameworks that can do the role of ESB's for e.g Mule, Service mix, etc.

Often, I have seen many organizations using the ESB or Integration platforms as Application platforms. This means they use them to build business applications that produces and consumes inter-application messages. This is to me an anti-pattern where Integration platforms are loaded with business logic, complex persistence flows and business decisions. It is a kind of bad-smell in the design of the system and usage. I see following dis-advantages in using this model.

1. The purpose of ESB's is to be an integration middleware, de-coupling producer and consumer applications in many ways.
2. ESB's help induce seperation of concern between Integration-logic and Business-logic. All integration logic is isolated in Integration environment, so that if it changes the change is limited to Integration platform and not applications.
3. ESB's are appropriate for connectivity, translations, enrichment's and  validations in the context message processing. When we start putting applications and business logic inside ESB, then it starts smelling and add more complexity to manage.
4. Most commercial vendors provide message flow framework to write mediators. Sequencing of activities within mediators creates a message processing flow. The flow is created to link the source application to the target application. Adding business logic, entity persistence, rules and decisions, controlling application logic in flows will defeat the purpose of seperation of concerns. Hence it increases rigidity and complexity of applications.
5. ESB's are not made to write applications, where applications need complex persistence mechanism, caching support, data-structure support and other application frameworks.
6. Business applications are best written in good object oriented languages, leveraging application frameworks, containers, management tools etc. For e.g. building order processing engines on Broker or ESB is not good idea as it is defeating ESB's purpose, and not leveraging application frameworks.
7. Applications are best written in OO platforms or frameworks like the Java, dotNet, Ruby on Rails etc.. and deployed on Application Servers like Websphere Application Server, JBoss, Tomcat, etc. The advantages of using OO is huge when written in good abstracted manner, resulting in well defined, modular, maintainable, reusable & component based applications.

I have often seen organizations that use Broker platforms to write applications that processes messages but also host business logic, business entity and persistence of business entities. I find this as  anti-pattern. Probable reasons they take this decision are:

1. Immaturity of decision making,
2. Organizations do not spend efforts in evaluation of architectural and platform decisions.
3. Not enough understanding and appreciation of Object oriented platforms.
4. Ivory tower architects making decisions without practical experience on platforms.
5. Not enough experience on Agile, Object oriented principles and processes.
6. Not enough clarity on usage of architectural styles and their fitment.

I would recommend reading following articles to understand fundamentals of ESB style of architecture.

Mark Richards on ESB
Fundamentals of ESB

I do welcome comments on this topic, please write your thoughts..

Tuesday, February 28, 2012

Jbpm and scheduler components.

Jbpm 4.x has inbuilt timer component that could be used for the standalone deployments. But when it come to production deployments on JEE servers like the websphere, weblogic etc, it is NOT prudent to use it as it uses custom threads that do not share container context. Due to this there could be issues related to transaction management, thread-safety, and database locks.
See issue related:
Un-managed threads
Issues with websphere

So there is need to customize the timer service to suite production deployments. One option is to use most popular open source scheduler library/framework Quartz

Alternatively, one could use websphere server provided scheduler service that is available on the jndi, is persistent, thread-safe, transaction-aware and easy to configure. Only problem is portability.

Quartz 1.8.x is most popular ones used. Recent versions 2.x are compatible with Java 1.5 threading mechanisms. But here is the approach to use quartz 1.8.x with Jbpm 4.x, along with Spring 3.x framework.

Assuming you have done spring integration with jbpm. Where ever you need timer functionality you use the custom state activity in jbpm. This activity will use Spring scheduler factory bean to schedule a quartz job with configured time at run time. The activity then goes to a wait state until the trigger happens or the job is canceled. Here is the program sample for using custom scheduler activity in jbpm.

public class CustomScheduler ExternalActivityBehaviour {

private static final long serialVersionUID = 1L;
private Scheduler scheduler;

public void execute(ActivityExecution execution) {
String executionId = execution.getId();

scheduler.scheduleJob(new TimerJob(executionId,timeInSecs,..);
execution.waitForSignal();
}

public void signal(ActivityExecution execution,String signalName,
Map parameters) {
execution.take(signalName);
}
}


This is simple mechanism to introduce custom timer. The drawback is that this needs improvement to provide better semantics and xml domain elements for ease of use. This could be improved by extending current implementation
of timer service and elements to use quartz framework

There is one important change needed in existing quartz 1.8.x framework to allow it to work with Spring framework and use
websphere provided container threads. The steps are:
1. Apply patch on 1.8.x code as given here
2. Configure Spring to use quartz along with websphere workmanager threads.
3. Use global transaction to create jobs
4. Provide non-XA datasource in scheduler factory configurations.

This should help you work with spring + quartz 1.8.x + Websphere 6.1 smoothly.
Here is the sample configuration:

<bean name="engineScheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<property name="schedulerName">
<value>MyScheduler</value>
</property>
<property name="dataSource">
<ref bean="dataBaseSource">
</ref>
<property name="nonTransactionalDataSource">
<ref bean="nonXADatasource">
</ref>
<property name="startupDelay">
<value>20</value>
</property>
<property name="waitForJobsToCompleteOnShutdown">
<value>true</value>
</property>
<property name="applicationContextSchedulerContextKey">
<value>applicationContext</value>
</property>
<property name="taskExecutor" ref="scheculerTaskExecutor">
<property name="transactionManager" ref="transactionManager">
<property name="quartzProperties">
<props>
<prop key="org.quartz.threadPool.class">org.springframework.scheduling.quartz.LocalTaskExecutorThreadPool</prop>
<prop key="org.quartz.scheduler.instanceId">AUTO</prop>
<prop key="org.quartz.scheduler.threadsInheritContextClassLoaderOfInitializer">true</prop>
<prop key="org.quartz.jobStore.class">org.springframework.scheduling.quartz.LocalDataSourceJobStore</prop>
<prop key="org.quartz.jobStore.isClustered">true</prop>
<prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.oracle.OracleDelegate</prop>
</props>
</property>
</property>

It is important to give non-XA datasouce as below:

<bean id="nonXADatasource" class="oracle.jdbc.pool.OracleDataSource" method="close">
<property name="connectionCachingEnabled" value="true"/>
<property name="jdbcUrl" value="jdbc:oracle:thin:@localhost:1521:XE"/>
<property name="user" value="username"/>
<property name="password" value="secret">
<property name="minPoolSize" value="5"/>
<property name="maxPoolSize" value="20"/>
<property name="acquireIncrement" value="1"/>
<property name="connectionCacheProperties"/>
<props merge="default">
<prop key="MinLimit">3</prop>
<prop key="MaxLimit">20</prop>
</props>
</property>
</bean>