The biggest difficulty I see is how to chop up documents into their individual pages. How that's done depends on the formats you need to support. With PDF it's relatively easy (the iText library has a built-in tool that can do that).
DOC and
RTF, on the other hand, have no indication which contents goes on which page.
Learning systems generally aren't based on document types like the ones you mention. Rather there's a structured knowledge base in plain text or XML files (or, more likely these days, in a CMS).