Have you investigated WebRTC + WebSockets? My knowledge in those area is little, but here are some suggestions that may help you research some more.
These may not be the best possible solutions, so please prototype them first. Also keep in mind that these are fairly new and sometimes incomplete standards and browser implementations, so
lack of browser support is something to be aware of.
Server to client. The server will periodically generate audio that must be sent to every client. This will be initiated by the server
One way is to encode the PCM audio in base64, broadcast it from server through websockets and play them on browsers using
Web Audio API
Client to server. A client can speak into a microphone and that audio must be sent to the server where a speech to text algorithm will operate on it.
The ideal way would be if server itself can also act as a WebRTC peer so you can stream audio from client to it over low latency connection. Such a server implementation seems to exist (see
http://doc-kurento.readthedocs.org/en/stable/tutorials/java/tutorial-recorder.html).
An alternate way is to use MediaStreamRecorder to record audio, upload micro batches of audio data to server over regular HTTP, and stream that data to your speech recognition algorithm.
I'm assuming you already have such a recognition algorithm (if not, try CMU Sphinx).
Client to client. A client can speak into a microphone and that audio must be sent to another user (possibly using the server as a stepping stone).
This one is the easiest because it's exactly the peer-to-peer communication use case WebRTC is meant for. See
this site for audio conf (and many other WebRTC) examples.
Another site with some good examples:
https://github.com/webrtc/samples