I am not able to understand the working of many pig operators like how they operate on data-set, is there any online material that discusses on this.
Also is there any place where I can create tuples/data-set and try running PIG commands/operators online?
Regarding trying out pig online, one option I can think of is Amazon's AWS EMR (Elastic Map Reduce). It's a pay-as-you-go web service.
There are public datasets available on their AWS S3 storage service, such as this one.
If you have never tried pig at all, then start off by running pig locally in a VM on your machine. Just download, extract and run in local mode with "pig -x local". Other than Java, nothing else is required (it already has hadoop embedded, so you don't even have to install hadoop in this mode).
Under the extracted directory, there's a /tutorial subdirectory with a simple dataset named excite.log. You can learn pig by trying it out on that dataset.