Riccardo Tasso (@riccardotasso)
JUG Trento - 03/11/2015
You'll find sources and examples of this slides on github
If implicit schema are such a problem
(2013, Martin Fowler)
the capability of adding a new node to a (distributed) system to improve its performances
it is impossible for a distributed computer system to simultaneously provide all three of the following guarantee:
graph theory started in the 18th century
compiler optimizations
computer networks (Internet)
the WWW (hypertexts)
pipe network analysis
circuit theory
language models
protein interaction network
social networks
Any storage system that provides index-free adjacency
(Marko Rodriguez, Peter Neubauer, 2010)
let's try selecting pizzas liked by Nicola!
wow, it's easiest to sketch!
With a GraphDB
Unleash the power of TRAVERSAL!
At least with SQL I know how to write it:
SELECT pizza.name
FROM
Person as person
JOIN Likes as likes ON person.id = likes.person
JOIN Pizza as pizza ON likes.pizza = pizza.id AND
JOIN Contains as contains ON pizzza.id = contains.pizza
JOIN Ingredient as ingredient ON contains.ingredient = ingredient.id
JOIN TypicalOf as typicalOf ON ingredient.id = typicalOf.ingredient
JOIN Region as region ON typicalOf.region = region.id
WHERE region.name = 'taas'
but where are Nicola's friends? :(
@Test public void myFirstGraphTest() {
Graph graph = new TinkerGraph();
Vertex nicola = graph.addVertex("nicola");
nicola.setProperty("biography",
"A long time ago in galaxy far far away...");
nicola.setProperty("experience", 100);
Vertex cristian = graph.addVertex("cristian");
Vertex riccardo = graph.addVertex("riccardo");
Edge nicolaFriendOfCristian = graph
.addEdge(null, nicola, cristian, "friendOf");
Edge nicolaFriendOfRiccardo = nicola
.addEdge("friendOf", riccardo);
System.out.println(DateTime.parse("2015-02-18"));
nicolaFriendOfCristian.setProperty("since",
DateTime.parse("2015-02-18"));
nicolaFriendOfRiccardo.setProperty("since",
DateTime.parse("2014-02-26"));
assertEquals(3, size(graph.getVertices()));
assertEquals(2, size(graph.getEdges()));
graph.shutdown();
}
@Test public void complexGraphTest() {
Graph graph = new TinkerGraph();
Vertex nicola = graph.addVertex("nicola");
nicola.setProperty("presentations", Arrays.asList("mvn", "java8"));
Map<String, Object> technologies = new HashMap<>();
technologies.put("java", 20);
technologies.put("sql", 15);
nicola.setProperty("technologies", technologies);
assertEquals(2, size(nicola.getPropertyKeys()));
graph.shutdown();
}
@Test
public void iterateOverMyFirstGraphTest() {
Graph graph = new TinkerGraph();
Vertex nicola = graph.addVertex("nicola");
Vertex cristian = graph.addVertex("cristian");
Vertex riccardo = graph.addVertex("riccardo");
Edge nicolaFriendOfCristian = graph
.addEdge(null, nicola, cristian, "friendOf");
Edge nicolaFriendOfRiccardo = nicola
.addEdge("friendOf", riccardo);
Edge riccardoFriendOfCristian = riccardo
.addEdge("knows", cristian);
System.out.println("* out degree:");
for(Vertex v : graph.getVertices()) {
System.out.println(format("%s has %d outgoing edges",
v, size(v.getEdges(Direction.OUT))
));
}
System.out.println("* degree:");
for(Vertex v : graph.getVertices()) {
System.out.println(format("%s's degree: %d",
v, size(v.getEdges(Direction.BOTH))
));
}
System.out.println("* list 'friendOf' edges:");
for(Edge e : graph.getEdges("label", "friendOf")) {
System.out.println(format("%s -> %s",
e.getVertex(Direction.OUT),
e.getVertex(Direction.IN)
));
}
graph.shutdown();
}
Pipes is a lazy dataflow framework using process graphs
@Test public void myFirstPipeTest() {
List<String> romans = Lists.newArrayList("MMXV", "MCMLXXXIII", "I");
TransformPipe<String, Integer> romanToInt = new RomanToIntPipe();
FilterPipe<Integer> bigInteger = new BigIntegerPipe(1000);
TransformPipe<Integer, Integer> makeOdd = new MakeOddPipe();
romanToInt.setStarts(romans);
bigInteger.setStarts((Iterable<Integer>) romanToInt);
makeOdd.setStarts((Iterable<Integer>) bigInteger);
while(makeOdd.hasNext()) {
System.out.println(makeOdd.next());
}
}
output: 4031, 3967
@Test public void metaPipeTest() {
List<String> romans = Lists.newArrayList("MMXV", "MCMLXXXIII", "I");
TransformPipe<String, Integer> romanToInt = new RomanToIntPipe();
FilterPipe<Integer> bigInteger = new BigIntegerPipe(1000);
TransformPipe<Integer, Integer> makeOdd = new MakeOddPipe();
Pipeline<String, Integer> pipeline =
new Pipeline<>(romanToInt, bigInteger, makeOdd);
pipeline.enablePath(true);
pipeline.setStarts(romans);
while(pipeline.hasNext()) {
System.out.println(pipeline.next());
System.out.println(pipeline.getCurrentPath());
}
}
output: 4031: [MMXV, 2015, 4031]
output: 3967: [MCMLXXXIII, 1983, 3967]
How is Pipes related to graphs?
@Test public void graphPipeTest() {
Graph graph = PizzaGraphFactory.create();
VerticesVerticesPipe out = new VerticesVerticesPipe(Direction.OUT);
PipeFunction<LoopPipe.LoopBundle, Boolean> proceedCondition =
new PipeFunction<LoopPipe.LoopBundle, Boolean>() {
@Override public Boolean compute(LoopPipe.LoopBundle argument) {
Element v = (Element) argument.getObject();
return !v.getId().equals("trentino");
}
};
LoopPipe loop = new LoopPipe(out, proceedCondition);
Pipeline pipeline = new Pipeline(loop);
pipeline.enablePath(true);
pipeline.setStarts(graph.getVertices("id", "nicola"));
}
LoopPipe loop = new LoopPipe(out, proceedCondition);
Pipeline pipeline = new Pipeline(loop);
pipeline.enablePath(true);
pipeline.setStarts(graph.getVertices("id", "nicola"));
while(pipeline.hasNext()) {
System.out.println(pipeline.next());
System.out.println(pipeline.getCurrentPath());
}
v[trentino]: [v[nicola], v[oro], v[mushroom], v[trentino]]
v[trentino]: [v[nicola], v[oro], v[spek], v[trentino]]
v[trentino]: [v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]]
Okay Doc, bring me back to SQL!
Gremlin is a graph traversal language
rayman@HAL9100 ~/gremlin-groovy-2.6.0 $ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('/tmp/pizza.graphml')
==>null
gremlin>
gremlin> g.v('nicola').out().loop(1){it.object.id!='trentino'}.path
==>[v[nicola], v[oro], v[spek], v[trentino]]
==>[v[nicola], v[oro], v[mushroom], v[trentino]]
==>[v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]]
gremlin>
gremlin> g.v('nicola').out().loop(1){it.object.id!='trentino'}
.in().in().dedup()
==>v[oro]
==>v[boscaiola]
@Test public void pizzaGremlinJavaTest() {
Graph graph = PizzaGraphFactory.create();
GremlinPipeline pipeline = new GremlinPipeline();
pipeline.start(graph.getVertex("nicola"))
.as("explore")
.out().as("outgoing")
.loop("explore", PipesTest.proceedCondition)
.path();
while(pipeline.hasNext())
System.out.println(pipeline.next());
}
pizzas which are liked by Nicola and have an ingredient typical of Trentino
g.V('id', 'nicola')
.out('likes').as('pizza')
.out('contains')
.out('typicalOf').has('id', 'trentino')
.back('pizza')
==>v[oro]
pizzas which contains at least one ingredient contained in Boscaiola
g.V('id', 'boscaiola').as('pizza')
.out('contains')
.in('contains')
.except('pizza')
==>v[oro]
==>v[diavola]
==>v[margherita]
==>v[oro]
starting from Nicola, explore all the outgoing relations untill Trentino is found
g.v('nicola')
.out()
.loop(1){it.object.id!='trentino'}
.path
==>[v[nicola], v[oro], v[spek], v[trentino]]
==>[v[nicola], v[oro], v[mushroom], v[trentino]]
==>[v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]]
from Cristian follow two paths: the pizzas he likes and the pizzas which are liked by his friends
g.V('id', 'cristian').copySplit(
_().out('likes').id,
_().both('friendOf').out('likes').id
).fairMerge()
==>diavola
==>margherita
==>oro
t = new Table()
g.V().as('person')
.out('likes')
.out('contains').as('ingredient')
.table(t)
gremlin> t
==>[person:v[cristian], ingredient:v[salami]]
==>[person:v[cristian], ingredient:v[tomatoe]]
==>[person:v[cristian], ingredient:v[cheese]]
==>[person:v[nicola], ingredient:v[tomatoe]]
==>[person:v[nicola], ingredient:v[cheese]]
==>[person:v[nicola], ingredient:v[cheese]]
==>[person:v[nicola], ingredient:v[spek]]
==>[person:v[nicola], ingredient:v[mushroom]]
==>[person:v[riccardo], ingredient:v[cheese]]
==>[person:v[riccardo], ingredient:v[mushroom]]
@Test public void pizzaGremlinGroovyTest() throws ScriptException {
Graph graph = PizzaGraphFactory.create();
ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("gremlin-groovy");
Bindings bindings = engine.createBindings();
bindings.put("graph", graph);
bindings.put("nicola", graph.getVertex("nicola"));
GremlinGroovyPipeline pipeline = (GremlinGroovyPipeline) engine.eval(
"nicola.out().loop(1){it.object.id!='trentino'}.path", bindings);
while(results.hasNext())
System.out.println(results.next());
}
Rexster is a graph server
rayman@HAL9100 ~/rexster-server-2.6.0 $ ./bin/rexster.sh -s
http://localhost:8182/doghouse/main/graph/pizzajugFrames exposes any Blueprints graph as a collection of interrelated domain objects (an Hibernate for graphs?!).
Furnace is a collection of graph algorithms running over Blueprints interface
Wait a minute: I've heard of Triple Stores...
a triple is a statment regarding:
RDF/RDF-S | PGM |
---|---|
URI | local IDs |
open-world | closed-world |
classes and subclasses | one big cathegory (V) |
edge inheritance | edge properties |
@prefix jug: <http://www.jugtaas.org/owl/jug.owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix po: <http://www.co-ode.org/ontologies/pizza/pizza.owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
SELECT ?pizza
WHERE {
jug:nicola a foaf:Person .
jug:nicola jug:likes ?pizza .
?pizza a po:Pizza .
?pizza po:hasIngredient ?ingredient .
?ingredient jug:isTypicalOf ?place .
?place rdfs:label "Trentino-Alto Adige"
}
Do you prefer Gremlin?
Semantic Web | Tinkerpop |
---|---|
inference | - |
really a standard | - |
mature | continuously evolving |
- | traversal oriented |
??? | designed to replace your database* |
* only for those use case that require a Graph!!!
SELECT ?name ?age
WHERE {
?person v:label "person" .
?person v:name ?name .
?person v:age ?age .
?person e:created ?project .
FILTER (?age > 30)
}
sparql-gremlin
...the new standard could also be something else than SPARQL or Gremlin
MATCH (node1)-->(node2)
RETURN node2.propertyA, node2.propertyB
OpenCypher
Object Oriented Databases are the common ancestor of Graph Databases and Triple Stores!
ODatabaseDocumentTx db = new ODatabaseDocumentTx(DATABASE_URL).create();
initSchema(db);
ODocument luke = new ODocument("Person");
luke.field("name", "Luke");
luke.field("surname", "Skywalker");
// http://starwars.wikia.com/wiki/Luke_Skywalker
ODocument lukePhysical = new ODocument()
.field("species", "human")
.field("gender", "male")
.field("height", 1.72) // implicit meters
.field("mass", 77) // implicit kg
.field("hair", "blonde")
.field("eyes", "blue")
.field("cybernetics", "Prosthetic right hand");
luke.field("physical", lukePhysical);
luke.save();
// http://starwars.wikia.com/wiki/Polis_Massa
ODocument polisMassa = new ODocument("Place")
.field("region", "Outer Rim Territories")
.field("sector", "Subterrel sector")
.field("system", "Polis Massa System")
// ...
.save();
luke.field("born", polisMassa);
luke.save();
db.close();
System.out.println(luke);
Person#9:0 {
name: Luke,
surname: Skywalker,
physical: {
species: human,
gender: male,
height: 1.72,
mass: 77,
hair: blonde,
eyes: blue,
cybernetics: Prosthetic right hand
},
born: #10:0
}v1
Classes | Clusters |
---|---|
logical set of documents | physical partition of documents |
identified by name | identified by sequential id |
a document may be assigned to one class | a document is always assigned to one cluster |
private void initSchema(ODatabaseDocumentTx db) {
OClass person = db.getMetadata().getSchema()
.createClass("Person");
person.createProperty("physical", OType.EMBEDDED);
person.createProperty("born", OType.LINK);
}
Do you remember the Blueprints example? It's the same!
Graph graph = new OrientGraph(DATABASE_URL);
Vertex nicola = graph.addVertex("nicola");
...
Just remember the right implementation!
Graph Model: something more than PGM
You can also play with gremlins!
Define your domain POJO
package com.github.raymanrt.orientdb4jug.orient.starwars;
public class Person {
private String name;
private String surname;
private Physical physical;
private Place born;
public Person() {};
// getters and setters
}
package com.github.raymanrt.orientdb4jug.orient.starwars;
public class Jedi extends Person {
public Jedi() {};
}
Setup the environment
OObjectDatabaseTx db = new OObjectDatabaseTx(DATABASE_URL).create();
db.getEntityManager()
.registerEntityClasses("com.github.raymanrt.orientdb4jug.orient.starwars");
OClass person = db.getMetadata().getSchema().getClass("Person");
person.createProperty("physical", OType.EMBEDDED);
Work with your data
Person padme = db.newInstance(Person.class);
padme.setName("Padme");
padme.setSurname("Amidala");
db.save(padme);
Jedi luke = db.newInstance(Jedi.class);
luke.setName("Luke");
luke.setSurname("Skywalker");
Physical physical = new Physical();
physical.setHair("blonde");
physical.setEyes("blue");
luke.setPhysical(physical);
Place polisMassa = db.newInstance(Place.class);
polisMassa.setName("Polis Massa");
// ...
db.save(polisMassa);
luke.setBorn(polisMassa);
db.save(luke);
Work with your data
assertEquals(2, db.countClass("Person"));
assertEquals(1, db.countClass("Jedi"));
assertEquals(1, db.countClass("Place"));
ODocument lukeAsDocument = db.getRecordByUserObject(luke, false);
assertNotNull(lukeAsDocument.getIdentity());
assertNotEquals(ORecordId.EMPTY_RECORD_ID, lukeAsDocument.getIdentity());
System.out.println(lukeAsDocument);
inspired by SQL to be friendly
each field can be declared as:
SELECT name, @rid, out('likes')
FROM Person
WHERE name = 'Nicola'
operators:
SELECT name
FROM Pizza
WHERE
'trentino' in out('contains').out('typicalOf').name
AND 'nicola' in in('likes').name
SELECT name
LET $typicalPlace = out('contains').out('typicalOf').name
FROM Pizza
WHERE
'trentino' in $typicalPlace
AND 'nicola' in in('likes').name
SELECT name
LET $typicalPlace = out('contains').out('typicalOf').name,
$nicolasFriends = (
SELECT FROM Person
WHERE 'nicola' in both('friendOf').name
),
$pizzaLikers = in('likes')
FROM Pizza
WHERE
'trentino' in $typicalPlace
AND ('nicola' in $pizzaLikers.name OR $nicolasFriends in $pizzaLikers)
TRAVERSE out('friendOf')
FROM (SELECT FROM Person WHERE name = 'nicola')
WHILE $depth <= 3
STRATEGY BREADTH_FIRST
SELECT shortestPath(#8:32, #8:10, 'OUT', 'friendOf')
SELECT dijkstra(#8:32, #8:10, 'weightEdgeFieldName', 'OUT')
Try with orientqb (inspired by jOOQ)
Do you prefer this:
Query q = new Query()
.select(Projection.ALL)
.from("Class")
.where(projection("f2").eq(5))
.where(projection("f3").lt(0));
Or that?
String q = "SELECT *" +
"FROM Class" +
"WHERE f2 = 5 AND f3 < 0";
our use case for OrientDB
main query:
SELECT name, label,
in('hasActant').out('relatedToChapter') as chapters,
in('hasActant').out('locatedIn') as locations,
in('hasTag') as contributions,
FROM #21:6
speech cloud:
SELECT localName, $seq.size()
FROM Agent
LET $seq = (
SELECT FROM Sequence
LET $speakers = in('inSequence').out('speaker')
WHERE
$parent.$current in $speakers AND
#21:6 in $speakers
)
ORDER BY localName
how would you model a edge?
regular edges
lightweight edges