From what I understand, synchronized keyword syncs local thread cache with main memory. volatile keyword basically always reads the variable from the main memory at every access. Of course accessing main memory is much more expensive than local thread cache so these operations are expensive. However, a CAS operation use low level hardware operations but still has to access main memory. So how is a CAS operation any faster?
It isn't so much about the speed of a particular operation - it is the speed of the combined environment.
The problem with normal synchronization isn't so much that it is slow compared to other methods of memory consistency - it is that it achieves consistency by allowing only one thread to perform the action at a time. A single operation isn't very expensive or inefficient, but the efficiency of the system comes down because parallelism is reduced.
Contrast that with the Compare-And-Set algorithms which don't block other threads. They can achieve memory consistency without preventing multiple tasks from happening at one time, so system efficiency goes up.
Comparing CAS operations to volatile is not a safe comparison - they aren't really used for the same purpose. A CAS approach is used for multi-step processes, like incrementing a value or setting a result if the value you got at the beginning of the calculation is the same at the end. You can't do that with volatile: it only ensures safety in the snapshot of a get, but for a multi step process like incrementing a value, the process is not safe as the value could be changes between the get at the start of the increment and the set at the end of the increment.
A confession - I had read about the CAS operator in one another nice post here -- at that time I didn't know that by CAS we mean the Compare and Set kind of operations of the atomic variables. I thought it was some low level OS operator that helped achieve memory consistency in multi-threaded environments. I even google'd about it but I didn't get anything insightful.
I thought ( rightfully, yes ) CAS must be a standard abbreviation considering you and Henry have mentioned it. So I bookmarked that post for studying on the subject later. Now I know CAS is nothing but the Compare And Set kind operations of the atomic variables -- with all that extra looping overhead and such associated things. I understand there is/could be more to it at the OS levels ( as in there is a processor's CAS operator ), but that is for me to read about later ( I know most of you all don't like to talk about the implementation details and I haven't made my first hand efforts yet to understand things better ).
But thank you so much.
Edit - heck, six edits and I still feel I haven't said it the right way.
subject: Why are CAS (Atomic) operations faster than synchronized or volatile operations