• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

Is Enterprise Service Bus a "code smell"?

 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm a developer so I tend to have a limited perspective on/understanding of architectural matters. But my organisation is currently in the early stages of planning a big new application and the architects seem keen to apply an ESB approach to this. I can't go into the details of the system, but basically it will be taking in fairly large volumes (10s of TB?) of data via browser input and/or OCR documents over a period of weeks with likely big spikes on certain days, storing it, validating it and passing it through a pipeline for further processing, transformation and analysis. The initial storage might be a fast-write DB such as MongoDB, while validated/derived data might be stored in Hadoop, an RDBMS or some other scalable data-store. The initial web-based data capture will be interactive with potentially millions of web users and it needs to be responsive, while some of the later stages would be suitable for batch processing if necessary. Reporting and analysis applications may not all need to be implemented at the same time, but the core system would need to be able to support these in due course, with tens/low hundreds of users from a variety of clients e.g. browsers, third-party tools, BI etc.

This system is not going live for a few years yet, so there's plenty of time to refine the architecture, if we get the broad principles right up front. Right now, the architects seem to be looking at an ESB approach around SOA services for major components. Lots of boxes and layers, all looking rather monolithic to this untutored eye.

I've worked as a developer on heavy architectures for big Java EE systems before, and it was fairly nightmarish, not least because too little attention was paid to the brute mechanics of coping with large volumes of data and moving it through these complicated and chatty frameworks. These days I'm working on prototypes in the Big Data arena, so I'm a bit out of the loop on conventional enterprise architectures. But in my blissful ignorance, this monolithic ESB approach smells like it might be the wrong way to go, especially at a time when there seems to be a trend away from the ESB towards looser and more flexible frameworks of microservices, distributed data/processing and reactive design for concurrency and scalability. Given that this system will not be live until the end of the decade, is the ESB approach a tried-and-tested architectural solution that will still be equally valid then, or will it be likely to smell increasingly like dinosaur poop by the time it's fully implemented?

But what do I know, right? So I was wondering if you folk have a better practical insight into current trends in architectural approaches to large scale data-centric applications, and what the pros/cons of ESB might be.
 
Bartender
Posts: 9565
12
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My employer has what I understand to be one of the largest ESB's, in terms of volume of data transmitted. There's easily 100's if not 1000's of applications that interact with the ESB on a daily basis. That said, I am not directly involved with the implementation, maintenance or strategy of it, but as far as I know, it isn't going anywhere.

chris webster wrote:dinosaur poop



Hey, now, dinosaurs ruled the earth for millions of years!
The main problem I see is that ESB's have quietly worked behind the scenes for years and therefore are not sexy and don't attract developer mindshare (see COBOL, mainframes, etc.). Kind of like plumbing.
As for alternatives, I'm reminded of Greenspun's tenth rule:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.


I wonder if there's a corollary for ESB's
 
Sheriff
Posts: 13675
226
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Search for problems with ESB and in particular the warning from MuleSoft CTO, Ross Mason. More later, tablet out of batt
 
Junilu Lacar
Sheriff
Posts: 13675
226
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

chris webster wrote:I'm a developer so I tend to have a limited perspective on/understanding of architectural matters. But my organisation is currently in the early stages of planning a big new application and the architects seem keen to apply an ESB approach to this. ... So I was wondering if you folk have a better practical insight into current trends in architectural approaches to large scale data-centric applications, and what the pros/cons of ESB might be.



I am also "just a developer" but that doesn't seem to preclude me from having to make all sorts of architectural decisions. I might even say that I'm prone to making design decisions that lean towards the simpler rather than the grandiose.

One thing: do the architects in your organization spend at least 20% of their time every week working directly with code and the developers? Because if they don't, then that's probably one of the causes of the smell you're detecting. We call them "Visio Architects" and they're probably the worst kind of "architect" there is out there.
 
Junilu Lacar
Sheriff
Posts: 13675
226
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

chris webster wrote:or will it be likely to smell increasingly like dinosaur poop by the time it's fully implemented


Anything can be made to smell like poop with enough apathy and lack of diligence in writing clean code, ruthless refactoring, and continuous testing and integration as you go. That's why it's important for the architects to work with developers at least 20% of the time. One very successful group in my company requires their architects to spend at least 2 days every week, programming and collaborating with tech leads and developers.
 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@joe

Thanks for the reply. It's good to know that this can be made to work on what sounds like a fairly epic scale. Although your comment about plumbing reminds that my experience of EJB has been that you spend 90% of your time fighting with the plumbing and 10% working on the actual business requirements. But maybe that explains my antipathy to "ExB" generally. Anyway, I like your idea of implementing our own ESB in Lisp...

@junilu:

Thanks for those robust comments - it sounds like we have similar experience of being on the wrong end of "architecture"! In my organisation, developers are regarded mainly as production line workers, while we have lots of architects who never code but supposedly take care of the "clever" stuff. Which often works out about the way you'd expect i.e. the reality is that developers often spend most of their time working around/against the architecture to achieve the business goals. Of course, these are my colleagues, and there are plenty of clever and hard-working individuals among them. But with the best will in the world, I'm sometimes reminded of Brian Foote's entertainingly harsh comments in an interview with InfoQ:

Brian Foote wrote:
I think one of the potential crimes against computer science that was inflicted on the community or against the programming community was that it exacerbated this estrangement between people who thought they were real architects and the coders; there came to be this view; you know, I used to kind of cruelly say they became a flaccid cabal of cartoon worshippers drooling over UML center folds; they became totally divorced from the code...


I agree that good developers should be thinking about architecture, and good architects should be coding regularly, but that's not my current working environment. My instincts over ESB are that it smells like overkill for this application and is liable to introduce a lot of complexity and constraints that we may not need. If we were to take a looser approach e.g. based on microservices, we would still have time to re-work that to fit into an ESB if it become necessary, so I'm inclined to go along with the Mulesoft guy that if we're not sure we need an ESB, we probably don't.

Either way, it's not like we developers get much input into these decisions around here. If possible, I think I'll get together with a couple of colleagues and see if we can get some kind of skunkworks thing going where we implement a simplified skeleton of the overall business process, maybe based on microservices but without ESB, to get a feel for where the pain-points are likely to be and maybe figure out if/where an ESB might help.

Thanks, guys.
 
Junilu Lacar
Sheriff
Posts: 13675
226
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Brian Foote wrote:
I think one of the potential crimes against computer science that was inflicted on the community or against the programming community was that it exacerbated this estrangement between people who thought they were real architects and the coders; there came to be this view; you know, I used to kind of cruelly say they became a flaccid cabal of cartoon worshippers drooling over UML center folds; they became totally divorced from the code...


That's an awesome quote; must share this with other tech leads in my group who are not "architects"

Your question comes at an uncannily timely manner for me as I have just started looking at ESB as a possible way I can reduce some complexity in an integration project I'm on involving multiple enterprise applications. This project is actually supposed to be a "platform" although the current implementation seems to be all over the place with our "architects" seemingly just slapping together solutions based on whatever buzzword-compliant technology there is out there today. Smells a lot like resume padding, if you ask me. Then you have the Luddite cabal who insist on using Oracle RAC where something like MongoDB seems to be more appropriate. But when Oracle RAC can't cope with the tremendous volume and the you-know-what hits the fan, guess who's conspicuously not there to help fix things?
 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's another quote for you:

Martin Fowler wrote:...Egregious Spaghetti Box...


That's his take on "ESB" in his brisk (30 minute) look at microservices in a keynote talk at GOTO Berlin in December 2014.

Seems "opinion is divided" on the merits of ESB, as they say..
 
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The term Big Data is kid of overused. Big Data could mean

1) a lot of analytics done on small to medium sized bits of data. THis should be really called Big computation. For example, one day's worth of Twitter feed is relatively small data. 500 million tweets a dat is not terribly huge. So, if you are building an application that does some sort of analysis on real time twitter feeds:- let's say you are doing some sort of sentiment analysis on the feed. You need to do a lot of computation on each tweet, because you need to apply a lot of rules on each tweet. Also, you need to be able to do it in real time. At the end, you are not generating too much output either. Personally, I call this "Big Computation". However, the industry calls this kind of problem a big data problem
2) a lot of data being processed
Let's say, OTH, you wanted to store all the tweets since begginning of time, and do some sort of ad-hoc analysis on all of the datat ("find tweets about Kim Kardashian in a black dress"). Now you are talking about small amount of computations on a lot of data. You basically want to find tweets that mention "Kim Kardashian" and "black dress" (and it's synonyms). Very simple to do on a single tweet. Much harder to do it on 100s of billions of tweets. This is what I call Big Data
3) some sort of hybrid
In other words, "find tweets that talk smack about Kim Kardashian in a black dress". Now, you need to do sentiment analysis and pattern matching on all tweets. Now, you got both. A lot of designs will solve this problem by splitting this problem to 2. FOr example, you can have an engine that does sentiment analysis on every tweet than comes along, and feeds a database. Then you have another system that can do pattern matching on the database.

For simplicity's sake let's assume that a hybrid problem can be broken down into 2 problems, one that requires Big Computation and another that requires Big Data

An ESB design works on Big Computation problems. Since, there is relatively small data that get heavily processed, it's better to break your computation into steps, and have differrent machines do differrent steps. So, in our above example, if your sentiment analysis needs to rule each tweet through 100 rules. It makes sense to have one cluster of computers evaluate 40 of those rules, and another evaluate 60. It makes sense to simply broadcast the tweets over an ESB. If you have steps that depend on other steps, then it makes sense to simply keep adding more receivers to the bus. It makes sense to send data to the code since you are not dealing with lot of data going over the buss

However, if you are talking about processing a lot of data, it makes more sense to send code to data. You simply load the data in memory, spread it across multiple nodes, and then send code to the node that processes the data and spits out the results that get aggregated upstream. Essentially, this is what you do when you use Hadoop

So, going back to whether ESB is a good choice for a Big Data project, the answer is It depends. ESB certainly has it's strengths wrt to flexibility, scalability and fail-over. However, badly done, it can lead to bottlenecks. Done really badly, it can lead to network being the bottleneck. Most of the time, your application will be spending time transferring data over the bus. If that's happening, you might as well rip the whole thing off.

 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your thoughtful reply, Jayesh.

I don't think this application necessarily classifies as a "Big Data" project in terms of the 3 Vs, although the volumes are fairly high and the velocity for the data capture phase will be fairly high too. In your more precise terms, this looks more like a Big Data front-end, with - perhaps - a Big Computation back-end. The data will be collected over a relatively short period (several TB over a couple of weeks), and then it will be passed through various validation and transformation processes, before being aggregated for subsequent analysis and reporting via other applications later on. The detailed requirements for those downstream applications don't exist yet, so all I know at this stage is that the core application must be able to support that kind of thing eventually.

As a developer, if I were starting to build this today, I'd focus on the initial data acquisition, validation and storage. For example:

  • The data structures are a good fit to JSON, so I might choose to implement the web front-end with something like Angular, the web servers with something that has good support for reactive concurrency e.g. Play, and I'd probably use MongoDB as the initial store for incoming data because it's pretty fast for writing data and fits the data model anyway. That way it's JSON all the way - no XML transformations required - and all the major components should be scalable and responsive to short term peaks.
  • The downstream validation/transformation could be batch processes, or they could be fed via asynchronous reactive streams of some kind pulling data from the initial MongoDB store (maybe using Apache Spark). I might choose some flavour of Hadoop for the longer term store, because it is scalable, flexible and we are less concerned with response times in this part of the pipeline - we won't have a million web clients waiting for their browsers to respond at this point. Once the data arrives in Hadoop, it can stay there and - as you rightly say - we can focus in shipping the code to the data wherever possible.
  • The aggregation and analysis might be another good place to use Apache Spark (perhaps over YARN on the Hadoop cluster), which is also scalable, pretty fast and very flexible. Subsequent downstream applications could be implemented using anything that can talk to Hadoop, which is pretty much anything.
  • The major service interfaces within the application might be based on HTTP/REST using something lightweight like Spray, mostly passing parameters to processing in Spark or some other distributed framework, but I'm not sure about the options here.

  • Of course, all this is based on my very limited knowledge of today's technologies, not what might be around when this system goes live 5 years from now! But the basic features of the system seem fairly clear at this stage, even if the technology choices might vary.

    So my concern about the early focus on ESB is that it seems to be trying to solve an orchestration problem that isn't really there. Most of this processing is essentially a one-way data pipeline and can be encapsulated in fairly coarse-grained services, while we want to avoid moving data around any more than necessary. So we don't really seem to need or want the kind of multi-channel chattiness that ESB seems designed to deal with. The only point where there is a lot of communication is at the front end while we are acquiring data from web clients, and ESB wouldn't help there anyway.

    I guess I must be missing something, because the more I explore this question, the more convinced I am that ESB smells wrong here.
     
    Jayesh A Lalwani
    Rancher
    Posts: 2759
    32
    Eclipse IDE Spring Tomcat Server
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Right! If the immediate problem being solved is ingestion and transformation of terabytes of data, and the long term requirement is analytics on the data, then I would start by picking a database that can scale as the data scales, and integrates well with cloud back end technologies. This could be a traditional RDBMS database or a NoSQL database like MongoDB. Then I would build something using Hadoop/Spark that can do the ingestion/transformation. In fact, I bet that if you google for cloud based ETL or big data ETL, you will get plenty of OTS solutions. You don't need to re-invent the wheel

    ESB seems like someone is trying to over-engineer this.

    Note that since your back-end analytical requirements are not defined yet, I can bet my bottom dollar that your ingestion and transformation business rules are going to change big time. Generally, you want to design your database to make it easier for the analytics. And that depends on what your analytics engine needs. It's more than simply dropping an index on a table and expect everything to magically work. The simplest way to do this is a) first figure out what analytics you need, b) design your data store to meet the needs of the analytics engine c) design your ETL to feed your data store. However, it seems your architect is going about this in the opposite direction. I am very very certain that 3 months after your company implements this, they will figure out ... ehh.. it needs lot of rework.
     
    chris webster
    Bartender
    Posts: 2407
    36
    Scala Python Oracle Postgres Database Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks again, Jayesh. I'm comfortable with the data side of things, as that's a familiar world to an old Oracle developer like me. It's all this architectural voodoo I find confusing!
     
    All of the following truths are shameless lies. But what about this tiny ad:
    professionally read, modify and write PDF files from Java
    https://products.aspose.com/pdf/java
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!