I posted recently about the joinery library (thanks so much for all the help I recieved) I'm now experimenting with another library called nRo/DataFrame and I'm trying to get the index of the DataFrame.
I've searched through the javadoc but I haven't quite found what I'm looking for.
I've tried using a few different methods such as size() (which gave me the number of rows) and getRows() (which gave me a specifc row) but they didn't give me what I needed.
Using getColumns() got me something like:
but I need to get something more similar to :
In the joinery library I was able to access by calling .index() and it returned something like:
And I got back 2 for which position the height is but I need a way to also get which position each row is, so for all "t" in header which row they are in, or as pandas defines it, "The index (row labels) of the DataFrame."
I wonder which approach would be best to use so I can then
1. use a shuffle method to shuffle all the indexes,
2. then divide the array into only a few indexes and
3. then get a new dataframe based on only the few indexes from the divided array.
For ex. say the indexes of my current dataframe are [3, 6, 7, 9, 10, 11] I then shuffle that and get [7, 3, 9, 11, 10, 6] and then divide it to only get the first 3, so I'm left with [7, 3, 9] and I then get a new dataframe using the 3 remaining indexes which correspond to the indexes in the original dataframe.
Piet Souris wrote:2) add an IntegerColumn to the dataframe, containing the values 0, 1, ..., dataframe.size() - 1. In the selection you will see these numbers.
Perhaps this approach would be best suited for the needs mentioned above.
Additionally I do have unique string values in one of the columns, so I could do something like what is quoted below to get an array of numbers?
Thanks so much the StringColumn worked for getting the index.
Now, you told that the values of the "height" column were "s" and "t". So we sould start with
and then you follow it by:
Piet Souris wrote:As far as your previous reply concerns: can you tell me what you are trying to achieve with all these indices?
Basically what I'm trying to do is split up the dataframe into 2 dataframes, 1 containing all the the short and the 2nd containing all the talls, I'm using the indices to then shuffle the indexes (corresponding to the height column of the dataframes) so I can make a test and training set to use for data science predictions.
For ex. I make a talls dataframe by filtering all "t" in the original dataframe (containing the short and tall), "t" in the height column is at the index of the original dataframe at say [0, 3, 4, 6, 8, 9], in order to make an accurate test and training set, I shuffle the array so maybe after it's shuffled it will look like [3, 9, 4, 0, 8, 6] I then want to split the array of indexes (corresponding to the columns in the original dataframe) so I can give about 20% of the rows containing "t" to the test set and 80% of the rows containing "t" to the training set.
I want to be able to take the test set and predict if it is short or tall.