• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Elastic Search full import

 
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We use Elastic Search in our project and we follow the push approach.
We have a scheduler that runs at every 30 mins and reads data from the tables and push the same to Elastic Search using Spring Data Elastic Search.



We do read the whole table and insert into elastic search every 30 mins to keep the index data in sync with tables. The table is being owned by some other team.

Doubt :- Rows were deleted from the table. Now, when i do select *, i will not be getting the deleted records. but, those records still there in the elastic search and these records will become stale data in the search. How can i delete all the records in the elastic search before inserting as part of the schedule without affecting the search in the front screen.

Is there anyway I can use transaction here, so that i will do delete all, save and finally commit.

Thanks,
Baskar.S
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think this post would fit better in another forum. I shall try the Spring forum.
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Take a look at the "time to live"_ttl field documentation .
You're reinserting all records every 30 minutes and the entire updation process perhaps takes 20 minutes.
So setting _ttl to 55 minutes would automatically purge any document that was not updated in the last reindexing, that is, one whose version has not increased between the reindexing cycles.
 
Baskar Sikkayan
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Thanks for the solution.

I am just curious to know, What will happen if i stop my update process for 2 days for some reason?

So, all data will be deleted and i wont see any data until next update happens.


Thanks,
Baskar.S
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, what you said will happen if there is no reindexing before the ttl expires.

ttl is probably one of the simplest solutions.
But if your system can't guarantee a reindexing every x minutes, then you'll have to look into solving the reindexing problem at an architecture level.
Using DB triggers or JMS notifications or maintaining a trash table where records to be deleted are moved to rather than deleted. All those solutions probably require cooperation from the other team.

Another option is using 2 ES servers - while one is online handling searches, the second goes offline and gets updated. Then switch them until the next reindexing completes.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic