Congrats for the new book.
I read that R-Language is based on Scheme programming language and since I already know Clojure programming language and tons of JVM libraries are under my hands, why I would use R-Language?
To generalize more, if we have heavy supported languages like Java and C#, why to use R-language?
"R" is great for processing and analyzing large amounts of numeric data, providing hundreds of different statistical functions and graphs. Think Excel on steroids. Also check out some of the comments in the commons-math project - many of the functions provided there are based on "R" functions.
I use "R" to analyze gc data - one script will generate dozens of data points and several graphs, all in a few seconds.
Some times I need to generate large number of graphs. It is easy to make it with R because it's programming capacity. R also has tons of libraries for different analysis. For example Bioconductor helps to interpret and visualize biological data.
With so many new languages, limited time, and the steep learning curves often involved, this is a very valid question. In the scheme (no pun intended) of things, R has been around for long time, but with its increasing popularity (Forbes called it a name to know in 2011), more people are hearing about it.
I would make a distinction between data processing and manipulation, and data analysis. R excels at the later - summarizing data, identifying patterns and trends, predicting outcomes (along with estimates of the accuracy of those predictions), and visualizing the results.
Here are a few examples from my everyday work world (off the top of my head).
Senario 1 I often need to identify and understand the missing data contained in a database. This includes which fields (variables) have no missing data, which have moderate amounts of missing data, and which have large amounts of missing data. I also want to see any patterns that might exist in the missing data. For example: young white males are reluctant to answer questions about salary; people who skip questions about gender also do not answer questions about marital status and age; a large percentage of people give up after filling out the 3rd screen of a on-line form. Functions in the VIM (visualization of missing data) package allow me to create a visual map of the missing data in a database, highlighting any patterns present, with 2 or 3 lines of code.
Senario 2 I often want to access an external database, run a SQL select statement, and generate a report on the results that tell me:
(1) for each numeric variable, what are number of valid and invalid values, minimum, maximum, mean, standard deviation, and median
(2) for each categorical variable, what are the number and percentage of observations present for each unique level of the categorical variable
(3) what are the ten most unusal observations (cases with unique and/or unlikely combinations of values)
(4) for a specified subset of variables, what is the percentage of observations above a specified set of threshold values
I want the report to be succinct and I may (or may not) want the results broken down by the levels of an additional categorical variable (e.g., geographic location)
I can write a short script (usually a page or less) that will do all this, and save it as a function in my "library". In the future, I can generate these reports with a one line invocation:
summarize(database, sql_query, optional_by_variable)
Senario 3 I manage a subscription based website for my organization. When I analyze the log files, I would like to generate a single graph that
(1) lets me see usage over time
(2) identifies usage patterns by gender and age (separate trend lines with indications of variability around the trend lines)
(3) contrasts usage by country (separate graphs by country, all on a single page, arranged to faciliate comparisons)
Additionally, I want the graph to be compelling, attractive, and (given the level of detail), relatively uncluttered.
Using the qplot (quick plot) function in the ggplot2 (grammar of graphics, 2) package, I can create this graph with a single line of code.
Conclusion I am a strong believer in using the right tool for the right job. I once created an entire office management system entirely in csh, but I wouldn't do it again. R is an amazing tool for data analysis. There are better tools for data manipulation and processing.
My personal goal is to find tools that make my life easier, are fun to use, and expand my horizons with regard to what is possible. Working with R has been a rewarding, and frankly, humbling experience.