So if my program is a series of referentially transparent expressions that eventually makes some state change in a data store then hypothetically I could, for one service call, replace my program with the value it would create and store that to the database and my data would be in a valid consistent state.
So the natural question is what are good practices for reorganizing the effects to the ends of my referentially transparent expression (RTE) pipeline? I gather that the only side-effects should occur at the end of a pipeline of RTEs.
The approach I have been taking is that instead of causing side effects along the way I have each method or function build a data structure that is passed along down the function chain that eventually ends up being persisted or returned to the requesting client. Is it that simple?
To add intermediary logging in the middle of the pipeline would it just be an extra function call or should that be added in with something like aspect oriented programming as an AST manipulation?
This is a controversial question. Many imperative programmers ask how they could do logging would they switch to functional programming. One solution that often comes to mind is adding the log data to the result, and then treating it as the rest of the result by applying an effect (storing in a file for example). This might seem more functional than treating logging as a side effect, although it has a huge drawback in terms of memory use.
But the real question is why do you need logging? Generally, logging is not a requirement. It is just a way to see what is happening in your program in case something goes wrong. I once worked with a manager who used to say "I don't care if programs have bugs provided there are enough logs to find an fix the bugs quickly". Every one may have its own opinion. I prefer a program that works and has no logs. This is why I prefer functional programming. Functional programs are safer, so they need less tests and less logs. The problem about tests could be the subject of a whole separate discussion. To stick with logging, functional programmers generally do not use it, unless it is logging the business process in which case it is part of the output of the program and is treated as such. But logging as a side effect is generally not used in FP. (I know its hard to believe!)
I wanted to see what Pierre-Yves had to say before jumping in...
My experience has been that while logging is used somewhat less in FP, it is still fairly widely used. Two common idioms are returning all the logging data along with the actual result (as Pierre-Yves suggested), or some sort of fire-and-forget asynchronous logging which can be easily ignored or mocked for development/testing. An important mental shift in both cases is to think of logging output as data, not text. For a good talk about logs-as-data watch this talk from Clojure/conj 2012.
In our production Clojure applications, we still have some "regular" (text) logging, output asynchronously (fire-and-forget), but increasingly we're writing "logging" data to files, for aggregation and processing elsewhere. We treat that the same way as we treat other side-effects by pushing them to the edges of our code as much as possible. We often have the idea of a "commit" at the end of any given piece of processing and our computation hands back the result to be returned and a list of thunks to evaluate -- anonymous functions that take no arguments. That list of thunks will include store-to-database side effects or log-to-file side effects or whatever else needs to be done.
We've also experimented with returning a pure data structure that represents all the things that need committing. That's "nicer" from a functional perspective but not as convenient. You can take a look at a library of mine that supports this approach. We used that for a while at work but it was rather unwieldy in production usage.
Whatever approach you take, the key things are to try to keep most of your code pure, and push side-effects out to the edges, as much as is practical.
I spent the morning putting in a comma and the afternoon removing it.
-- Gustave Flaubert, French realist novelist (1821-1880)
One way to apply this technique is to use an actor framework. One can then use a dedicated logging actor. In, such case, each actor is responsible for some computation, and the result is then sent as a message to another actor. It is then easy to gather logging info during the computation and then send one message to a business actor with the result of the computation and another one to the logging actor. And if actors responsibility is limited, it is often even possible to create logging data only when messages are sent. Of course, this is mainly business logging (typically, which actor is sending which data at which time.) What functional programs don't need is logging information about internal state mutation of the components, since components are stateless so not mutation ever occurs.