Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

How to install/use Titan Graph Database: can you help?

 
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I would like to experiment with Titan Graph Database with the view of writing a Java class that import a Kafka (JSON) message into Titan. I am using Vagrant + VirtualBox on Windows and running a Ubuntu Vagrant box. I believe I am rather confused on what to download from https://github.com/thinkaurelius/titan/wiki/Downloads as the documentation states that Titan can be used in two ways:


1 ) "Embed Titan inside the application executing Gremlin queries directly against the graph within the same JVM. Query execution, Titan’s caches, and transaction handling all happen in the same JVM as the application while data retrieval from the storage backend may be local or remote."
2) "Interact with a local or remote Titan instance by submitting Gremlin queries to the server. Titan natively supports the Gremlin Server component of the Tinkerpop stack."

I am not sure how should I go about it, what is the actual different between the two ways of using Titan? What does the Tinkerpop stack add? I have been trying to get my head around the documentation, but it still massively eludes me. I ended up downloading this: http://s3.thinkaurelius.com/downloads/titan/titan-0.5.4-hadoop2.zip and the documentation says that it "contains all supported indexing backends, storage backends, Rexster, Hadoop 2, and the Gremlin REPL. Subsumes functionality offered by 0.4.z’s “Titan Server” zipfile." Is that what I need in my case? I was then under the impression that Cassandra as data storage backend was supported, but when I try to run the following from my Gremlin console as per the gettingstarted documentation in http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html :

gremlin> g = TitanFactory.open('conf/titan-cassandra-es.properties')

I am returned with the following error:

Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager

Can you help?

Thank you so much,

I
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I would like to experiment with Titan Graph Database with the view of writing a Java class that import a Kafka (JSON) message into Titan. I am using Vagrant + VirtualBox on Windows and running a Ubuntu Vagrant box



Have look at Titan's architecture diagram:


Based on that diagram, I think the simplest possible deployment you should start with for experimenting is:
  • 1 VM running Kafka locally. The same VM can also run a single node Zookeeper. Kafka requires Zookeeper.
  • 1 VM running your app. Your app uses Titan in embedded mode and runs queries using the TinkerPop API. Whether running as embedded or standalone, Titan supports the same API and query language.
  • 1 VM running single-node Cassandra locally. Titan uses this Cassandra node for storing data.


  • Once you have this deployment working, you can then independently expand out each component to multiple nodes.

    What does the Tinkerpop stack add?


    As its documentation states, TinkerPop is a graph abstraction layer over different graph databases and different graph processors.
    TinkerPop specifies a standard API - the "BluePrints API" - and a standard database independent graph query language - the "Gremlin language".
    Graph database products - such as Titan and Neo4J - then implement that standard BluePrints API and support graph queries expressed in Gremlin language.
    Notice how this Titan interface inherits from org.apache.tinkerpop.* interfaces.

    what is the actual different between the two ways of using Titan?


    Embedded vs. Server is a common deployment choice with many DB technologies. Other DBs like Derby and Neo4J too support it.

    In Embedded mode, Titan is loaded into the JVM like any other JAR library in the same process as the application.
    Use Embedded mode when you're either just prototyping and there's just 1 client application, or you're using in production but all the data in the db is private to that application.
    One adverse consequence of using Embedded mode in production is that if the application JVM crashes, it'll take down the embedded DB too and this might result in unrecoverable data loss.

    Use Server mode in production if multiple applications require that data.
    One positive benefit of using Server mode is that application JVM crash need not affect the DB because it's a separate process, possibly running on a different server machine too.

    Is that what I need in my case?


    The main wiki page says : - Titan 1.0.z is under active development. (recommended) - Titan 0.5.z is maintained but no longer extended. (stable)
    IMO, you should use Titan 1.0.0. "Titan 1.0.0 with Hadoop 1 or 2" either is fine, because you're anyway not using hadoop at all, but are using Cassandra for storage.

    under the impression that Cassandra as data storage backend was supported but...error


    Cassandra has a client JAR and a server component. The Titan distribution contains just the client JAR that enables it to talk to a Cassandra server.
    You still have to setup the Cassandra server yourself. That's what the 3rd VM above is doing.
     
    Geane Norm
    Ranch Hand
    Posts: 66
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Karthik, first of all thank you so much for such a comprehensive answer and for the time you took to explain the matter in detail. This is what it makes a community such as Coderanch great and powerful support tool for beginners.

    Karthik Shiraly wrote:

    I would like to experiment with Titan Graph Database with the view of writing a Java class that import a Kafka (JSON) message into Titan. I am using Vagrant + VirtualBox on Windows and running a Ubuntu Vagrant box



    It sounds like a great idea, however at the moment I have 1 VM that runs both Kafka and Titan. At the moment I am not running Cassandra, would that be wrong to have all of 3 running on the same VM (Vagrant + Virtualbox with Ubuntu 14.04) ? 3 VM would slow my laptop quite a lot.

    Karthik Shiraly wrote:

    Have look at Titan's architecture diagram:


    Based on that diagram, I think the simplest possible deployment you should start with for experimenting is:

  • 1 VM running Kafka locally. The same VM can also run a single node Zookeeper. Kafka requires Zookeeper.
  • 1 VM running your app. Your app uses Titan in embedded mode and runs queries using the TinkerPop API. Whether running as embedded or standalone, Titan supports the same API and query language.
  • 1 VM running single-node Cassandra locally. Titan uses this Cassandra node for storing data.


  • Once you have this deployment working, you can then independently expand out each component to multiple nodes.


    What does the Tinkerpop stack add?


    As its documentation states, TinkerPop is a graph abstraction layer over different graph databases and different graph processors.
    TinkerPop specifies a standard API - the "BluePrints API" - and a standard database independent graph query language - the "Gremlin language".
    Graph database products - such as Titan and Neo4J - then implement that standard BluePrints API and support graph queries expressed in Gremlin language.
    Notice how this Titan interface inherits from org.apache.tinkerpop.* interfaces.

    what is the actual different between the two ways of using Titan?


    Embedded vs. Server is a common deployment choice with many DB technologies. Other DBs like Derby and Neo4J too support it.

    In Embedded mode, Titan is loaded into the JVM like any other JAR library in the same process as the application.
    Use Embedded mode when you're either just prototyping and there's just 1 client application, or you're using in production but all the data in the db is private to that application.
    One adverse consequence of using Embedded mode in production is that if the application JVM crashes, it'll take down the embedded DB too and this might result in unrecoverable data loss.

    Use Server mode in production if multiple applications require that data.
    One positive benefit of using Server mode is that application JVM crash need not affect the DB because it's a separate process, possibly running on a different server machine too.

    Is that what I need in my case?



    Yes, you are very much right. I had some issues unzipping the file at http://s3.thinkaurelius.com/downloads/titan/titan-1.0.0-hadoop1.zip and was returned with an error saying "End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.", so I resorted to http://s3.thinkaurelius.com/downloads/titan/titan-0.5.4-hadoop1.zip which seemed to work straightaway. I have now downloaded, unzipped and installed titan-1.0.0-hadoop1.zip. It is interesting to notice that after installing titan 0.5.4 and running bin/gremlin.sh, I could not see the the following:

    plugin activated: aurelius.titan
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities

    Whereas I cannot see the activated plugins when I access to the gremlin titan 0.5.4. Any reason why?

    Karthik Shiraly wrote:The main wiki page says : - Titan 1.0.z is under active development. (recommended) - Titan 0.5.z is maintained but no longer extended. (stable)
    IMO, you should use Titan 1.0.0. "Titan 1.0.0 with Hadoop 1 or 2" either is fine, because you're anyway not using hadoop at all, but are using Cassandra for storage.



    under the impression that Cassandra as data storage backend was supported but...error


    Cassandra has a client JAR and a server component. The Titan distribution contains just the client JAR that enables it to talk to a Cassandra server.
    You still have to setup the Cassandra server yourself. That's what the 3rd VM above is doing.

     
    Karthik Shiraly
    Bartender
    Posts: 1210
    25
    Android Python PHP C++ Java Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You're welcome ! Try it out and if you run into problems, you're welcome to post them here and we'll help.
     
    Geane Norm
    Ranch Hand
    Posts: 66
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Karthik, I commented on two of your comments, but actually used rather wrongly the quote feature and did not seem to have shown. Can you have a look at the thread and reply to my questions, please? Thank you so much!

    Geane Norm wrote:Karthik, first of all thank you so much for such a comprehensive answer and for the time you took to explain the matter in detail. This is what it makes a community such as Coderanch great and powerful support tool for beginners.

    I would like to experiment with Titan Graph Database with the view of writing a Java class that import a Kafka (JSON) message into Titan. I am using Vagrant + VirtualBox on Windows and running a Ubuntu Vagrant box



    Geane Norm wrote:It sounds like a great idea, however at the moment I have 1 VM that runs both Kafka and Titan. At the moment I am not running Cassandra, would that be wrong to have all of 3 running on the same VM (Vagrant + Virtualbox with Ubuntu 14.04) ? 3 VM would slow my laptop quite a lot.

    Have look at Titan's architecture diagram:


    Based on that diagram, I think the simplest possible deployment you should start with for experimenting is:

  • 1 VM running Kafka locally. The same VM can also run a single node Zookeeper. Kafka requires Zookeeper.
  • 1 VM running your app. Your app uses Titan in embedded mode and runs queries using the TinkerPop API. Whether running as embedded or standalone, Titan supports the same API and query language.
  • 1 VM running single-node Cassandra locally. Titan uses this Cassandra node for storing data.


  • Once you have this deployment working, you can then independently expand out each component to multiple nodes.


    What does the Tinkerpop stack add?


    As its documentation states, TinkerPop is a graph abstraction layer over different graph databases and different graph processors.
    TinkerPop specifies a standard API - the "BluePrints API" - and a standard database independent graph query language - the "Gremlin language".
    Graph database products - such as Titan and Neo4J - then implement that standard BluePrints API and support graph queries expressed in Gremlin language.
    Notice how this Titan interface inherits from org.apache.tinkerpop.* interfaces.

    what is the actual different between the two ways of using Titan?


    Embedded vs. Server is a common deployment choice with many DB technologies. Other DBs like Derby and Neo4J too support it.

    In Embedded mode, Titan is loaded into the JVM like any other JAR library in the same process as the application.
    Use Embedded mode when you're either just prototyping and there's just 1 client application, or you're using in production but all the data in the db is private to that application.
    One adverse consequence of using Embedded mode in production is that if the application JVM crashes, it'll take down the embedded DB too and this might result in unrecoverable data loss.

    Use Server mode in production if multiple applications require that data.
    One positive benefit of using Server mode is that application JVM crash need not affect the DB because it's a separate process, possibly running on a different server machine too.

    Is that what I need in my case?



    Geane Norm wrote:Yes, you are very much right. I had some issues unzipping the file at http://s3.thinkaurelius.com/downloads/titan/titan-1.0.0-hadoop1.zip and was returned with an error saying "End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.", so I resorted to http://s3.thinkaurelius.com/downloads/titan/titan-0.5.4-hadoop1.zip which seemed to work straightaway. I have now downloaded, unzipped and installed titan-1.0.0-hadoop1.zip. It is interesting to notice that after installing titan 0.5.4 and running bin/gremlin.sh, I could not see the the following:

    plugin activated: aurelius.titan
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities

    Whereas I cannot see the activated plugins when I access to the gremlin titan 0.5.4. Any reason why?

    Karthik Shiraly wrote:The main wiki page says : - Titan 1.0.z is under active development. (recommended) - Titan 0.5.z is maintained but no longer extended. (stable)
    IMO, you should use Titan 1.0.0. "Titan 1.0.0 with Hadoop 1 or 2" either is fine, because you're anyway not using hadoop at all, but are using Cassandra for storage.



    under the impression that Cassandra as data storage backend was supported but...error


    Cassandra has a client JAR and a server component. The Titan distribution contains just the client JAR that enables it to talk to a Cassandra server.
    You still have to setup the Cassandra server yourself. That's what the 3rd VM above is doing.

     
    Karthik Shiraly
    Bartender
    Posts: 1210
    25
    Android Python PHP C++ Java Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Karthik Shiraly wrote:You're welcome ! Try it out and if you run into problems, you're welcome to post them here and we'll help.



    Edit: Oh sorry, I didn't notice your new question at first.

    Yes, you are very much right. I had some issues unzipping the file at http://s3.thinkaurelius.com/downloads/titan/titan-1.0.0-hadoop1.zip and was returned with an error saying "End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.", so I resorted to http://s3.thinkaurelius.com/downloads/titan/titan-0.5.4-hadoop1.zip which seemed to work straightaway. I have now downloaded, unzipped and installed titan-1.0.0-hadoop1.zip. It is interesting to notice that after installing titan 0.5.4 and running bin/gremlin.sh, I could not see the the following:

    plugin activated: aurelius.titan
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities

    Whereas I cannot see the activated plugins when I access to the gremlin titan 0.5.4. Any reason why?


    I don't know the exact reason, but Titan did undergo a major redesign between v0.5.4 and v.0.9.0 to support TinkerPop v3. There are no intermediate versions between 0.5.x and 0.9.x, indicating that it was a major redesign. I'm guessing the difference is due to that.
     
    Geane Norm
    Ranch Hand
    Posts: 66
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Brilliant,
    thank you for this, I was simply curious as I have been trying to wrap my head around Titan! What about my current configuration? As mentioned in my other comment, I am not using 3 separate VMs, but kafka and Titan run on the same VM. Would that be OK? I do not actually have Cassandra installed, can it be installed on the same VM too? I have come across this http://s3.thinkaurelius.com/docs/titan/1.0.0/cassandra.html#cassandra-local-server-mode and it looks like I would be able to run a standalone Cassandra instance on the same VM. Is that correct?

    Thank you so much for your help, you are a star!

    Karthik Shiraly wrote:

    Karthik Shiraly wrote:You're welcome ! Try it out and if you run into problems, you're welcome to post them here and we'll help.



    Edit: Oh sorry, I didn't notice your new question at first.

    Yes, you are very much right. I had some issues unzipping the file at http://s3.thinkaurelius.com/downloads/titan/titan-1.0.0-hadoop1.zip and was returned with an error saying "End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.", so I resorted to http://s3.thinkaurelius.com/downloads/titan/titan-0.5.4-hadoop1.zip which seemed to work straightaway. I have now downloaded, unzipped and installed titan-1.0.0-hadoop1.zip. It is interesting to notice that after installing titan 0.5.4 and running bin/gremlin.sh, I could not see the the following:

    plugin activated: aurelius.titan
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities

    Whereas I cannot see the activated plugins when I access to the gremlin titan 0.5.4. Any reason why?


    I don't know the exact reason, but Titan did undergo a major redesign between v0.5.4 and v.0.9.0 to support TinkerPop v3. There are no intermediate versions between 0.5.x and 0.9.x, indicating that it was a major redesign. I'm guessing the difference is due to that.

     
    Karthik Shiraly
    Bartender
    Posts: 1210
    25
    Android Python PHP C++ Java Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Yes, everything running on a single beefy VM is fine too. Makes it even simpler than the 3 VM configuration. Each of them just runs as separate processes on the same VM. It's fine.
     
    Geane Norm
    Ranch Hand
    Posts: 66
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you Karthik, I have managed to install it on my VM. I will also need to install it on Windows as I have all my dev settings there and I will be running my Java apps from within IntelliJ. I have followed this set of instructions because of the well known issue with gremlin.bat: http://learntitandb.blogspot.co.uk/2016/01/titan-db-tutorial-1-download-and-run.html, but I am not sure how to install Cassandra or if I have ever need it to in order to simply test Titan core and TinkePop API. Can you help? Is the DataStax Community Edition a viable option?

    Thank you so much for all your help

    Karthik Shiraly wrote:Yes, everything running on a single beefy VM is fine too. Makes it even simpler than the 3 VM configuration. Each of them just runs as separate processes on the same VM. It's fine.

     
    Karthik Shiraly
    Bartender
    Posts: 1210
    25
    Android Python PHP C++ Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I hardly boot into Windows in recent years, and haven't tried any such deployments on it.
    However, the Apache Cassandra download package does contain .bat scripts. So I feel it's just a matter of downloading it and executing the .bat.
    The DataStax edition seems fine too. It's just Cassandra with some additional components to manage it.
     
    I'm not sure if I approve of this interruption. But this tiny ad checks out:
    Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
      Bookmark Topic Watch Topic
    • New Topic