• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

Need help with list filteration in Scala application

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am trying to create a scala program that accepts an argument of a file location. The scala app then checks to see if keywords in a list exist in the file specified. If the user supplies a second argument with a list of keywords then we filter against this, else we filter against hard coded keywords. I have mapped with index the specified file so each line in the read file is saved in a string and all the lines in a list.



I am struggling with the filter and group by part. I want the output to be: Map[String, [Int], int]. The first string would be the keyword. The int within the brackets would be a list of DISTINCT line numbers where the keyword was found in the specified file. The final int should contain the overall count of the keyword. So output of hardcoded keywords would be:

KeywordLinesCounts
foo[1,3..]10
foo[7,8,9]3
foo[]0


I coded a full solution using a mutable map to save the output as well as many foreach functions however I want to develop a solution which is as functional as possible, using possibly only immutable structures with as less Var as possible.
Thanks
 
Marshal
Posts: 7084
491
Mac OS X VI Editor BSD Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Hussain Faruk wrote:I want the output to be: Map[String, [Int], int]. The first string would be the keyword. The int within the brackets would be a list of DISTINCT line numbers where the keyword was found in the specified file. The final int should contain the overall count of the keyword.

Just to kick this off.

Map can hold key and the value only, the value of course can be a pair/tuple, but the way you specified it it isn't correct. Is this what you want Map[String, (List[Int], Int)]?

Please explain which parts kind of work as expected. Executing current code, could you please give an example of rdd output (minimal variant of it only)? Why you call variables in such a non-descriptive way? Give some clear meaning to it.
 
Liutauras Vilda
Marshal
Posts: 7084
491
Mac OS X VI Editor BSD Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Hussain Faruk wrote:using possibly only immutable structures with as less Var as possible.


Just to clarify (I know you probably had in your mind correctly, just the wording is a bit confusing), so the readers wouldn't get an incorrect impression about the 'var', 'val'. 'val' doesn't mean an object assigned to it is immutable, what it actually means, that such variable once it is assigned a value, cannot be reassigned.
 
Hussain Faruk
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Liutauras Vilda wrote:

Hussain Faruk wrote:using possibly only immutable structures with as less Var as possible.


Just to clarify (I know you probably had in your mind correctly, just the wording is a bit confusing), so the readers wouldn't get an incorrect impression about the 'var', 'val'. 'val' doesn't mean an object assigned to it is immutable, what it actually means, that such variable once it is assigned a value, cannot be reassigned.



Yes you are right, sorry for the ambiguity before. Below is a current solution I have, however I am wondering if it could be further coded to be more functional and efficient. Also my solution allows keywords to be of max size 22 as thats tuple max size. What if I want more than 22 keywords?



From some research I know I can make it more function by introducing more higher order funcs, more efficient by rearranging the order of some methods and moving some processing from inside the foldleft. Possibly even diving into the shapeless or Cats libraries. However I don't know how to implement these. The above solution is the best ive got to.
 
Hussain Faruk
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Hussain Faruk wrote:

Liutauras Vilda wrote:

Hussain Faruk wrote:using possibly only immutable structures with as less Var as possible.


Just to clarify (I know you probably had in your mind correctly, just the wording is a bit confusing), so the readers wouldn't get an incorrect impression about the 'var', 'val'. 'val' doesn't mean an object assigned to it is immutable, what it actually means, that such variable once it is assigned a value, cannot be reassigned.



Yes you are right, sorry for the ambiguity before. Below is a current solution I have, however I am wondering if it could be further coded to be more functional and efficient. Also my solution allows keywords to be of max size 22 as thats tuple max size. What if I want more than 22 keywords?



From some research I know I can make it more function by introducing more higher order funcs, more efficient by rearranging the order of some methods and moving some processing from inside the foldleft. Possibly even diving into the shapeless or Cats libraries. However I don't know how to implement these. The above solution is the best ive got to.



Sorry ignore the bit about max size of 22 for tuple.
 
You ridiculous clown, did you think you could get away with it? This is my favorite tiny ad!
professionally read, modify and write PDF files from Java
https://products.aspose.com/pdf/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!