23 Jul 2020

Knowledge Graph Notes

How should AI explicitly represent knowledge?

My notes from CS520 - Knowledge Graph Seminar

Session 1 - What is Knowledge Graph?

Speakers: Denny Vrandečić, Jans Aasman, Mikhail Galkin

Denny

Denny speaks about knowledge graphs (KGs) are used for web search, question answer systems, data integration systems.

Specific use case of Wiki Data

KG built on 80M nodes, 1B edges
Uses RDF open format (w3c)
Schema.org annotation
SPARQL for querying

Following the open format has great advantages. One example stated was how wikipedia can leverage other services which expose data. Wikipedia can run SPARQL to query OpenStreet Maps data - pick lat, long from OSM for ATMs in Munich for a specific bank network.

Jans

Discussion around modern KGs:

Semantic graph DB
Ontologies & Taxonomies
Rule based processing
ML & NLP based processing

AllegroGraph's KG implementation:

Document / NLP - Chomsky KG

The Noam Chomsky Knowledge Graph will link to over 1,000 articles & over 100 books that Chomsky has authored about linguistics, mass media, politics & war.
Event Based for Motefiore Health Care

Montefiore’s Patient-centered Analytical Learning Machine (PALM), a machine learning platform built from the ground up to predict & prevent life-threatening medical conditions & minimize wait times.

Mikhail

Think of KG as world models in terms of entities & relations
Encoding can be based on different representations

Representations:

Symbolic - Logic, DB

Store as triples

Subject-Predicate-Object

Rajakumara starring Puneet

Puneet born Chennai

Vector - NLP, Computer Vision
- Embeddings - Leverage high dimensional space & a function to group similar things nearby

KGs can be viewed from different Point of Views:

Logic programming, RDF way
RDMS - entities are cells, relations are columns
Computer Vision - CNN + RPN ⇒ graph inference
NLP
- Knowledge Graph ⇒ Named Entity Recognition
- Information Retrieval ⇒ Relation Linking
- Unstructured Sources ⇒ Question Answer System
- Language Models

Session 2 - How to create Knowledge Graph?

Speakers: Juan Sequeda, Chris Ré, Xiao Ling

Xiao Ling

Discussion on how Siri Knowledge is built. It is based on triples - subject, predicate, object.

Sources are:

Unstructured text articles
Semi structured
Structured features
Human Curated

All these are fused together to build the Siri KG. Techniques used - Info Box extraction from wikipedia, Entity Resolution.

Challenges

Fields do not match

Date of Birth, DoB, Birth Date - all of them mean the same

Built:

RIBE - Robust Info Box Extraction (HTML input ⇒ Triple as output)
Candidate Extraction Models
Entity Linking Models
Entity Resolution Models

Session 3 - What are some advanced knowledge graph?

Speakers: Mike Tung, Cogan Shimizu, Marie-Laure Mugnier

Mike

DiffBot KG built on full public web

Their pipeline looks like this

Page type classification - classify page type & language
Visual Extraction - extract product information, metadata links, images, price
NLU - language detection, enetity detection
Record Linking

Session 4 - What are some knowledge graph inference algorithms?

Speakers: An Hai Doan, Yuxiao Dong, Georg Gottlob

An Hai Doan

Discussion on Entity Matching use case - The Magellan Project

Entity matching steps

Blocking - reduce number of pair comparisons
Matching - Rule based / ML based

Yuxiao

Discussion on Microsoft Academic Graph (MAG)

Leverages heterogenous graph transformer

George

Mostly spoke about VADALOG

Session 5 - How to evolve a knowledge graph?

Speakers: Héctor Pérez-Urbina, José Manuel Gómez-Pérez, Mike Uschold

Hector

How to model a dynamic world?

Example of ambiguity - Vocaloids. Not only humans are artists but anime characters too.

Modifying a KG is far easier than modifying RDBMS. Easy to change → Add properties.

Evolution of KG

Can UI still work with the change?
Can all downstream applications work?
Schema validation

Test, Test, Test!

Jose

Discussion on KG for NLP

ML driven NLP
- Pros - flexible, SoTA, broad
- Cons - black box, lack real world understanding
KG based NLP
- Pros - curated, logical graph, no training, rich & deep representation
- Cons - rigid, brittle, expensive manual curation

Real world use cases:

COGITO - Expert NLP based on KG
- Sentence split / parse
- Morphological analysis
- Sentence / logical / grammar analysis
- Semantic analysis / disambiguation
Vecsigrafo
- Learns word & concept embeddings in shared space
- Combines corpus based & graph based knowledge to build word representation
- Uses & extends swivel algorithm
Transigrafo - Transformers + KG

Mike

Get rid of data silos with KG
Use triple stores instead of RDBs
ALWAYS HAVE A SCHEMA!
Use SHACL - SHApes Constrains Language
Use OWL + SHACL
Build 1 enterprise ontology

Session 6 - How do users interact with knowledge graphs?

Speakers: Amit Prakash, Chaomei Chen, Leilani Gilpin

Amit

Discussion on work at ThoughtSpot

ThoughtSpot success is mostly attributed to great UX. They took a year to crack the UX.
Stick to simple algorithms, they work 80% of the time
Figure out ways to collect data, labels with simple algorithms based on user feedback & interaction
Figure out success KPIs

Chaomei

Discussion on work at Drexel University

KG on top of research papers & citations
Temporal movement analysis
Pay attention to relation across domains. Look for bridges across clusters & their importance

Leilani

Discussion on explaining explanations

Explainability ≠ Interpretibility
Interpretable ⇒ understandable to humans
Completeness ⇒ describe operations in an accurate way
Explanation needs needs to be both interpretable & complete

Session 7 - What are some prevalent graph engines in industry?

Speakers: Philip Rathle, Brad Bebee, Matei Zaharia

Philip

Showcased neo4j & customers - NASA, eBay, DZD
Spoke about property Graph way of modelling

Brad

Showcased AWS Neptune

Matei

Showcased Databricks & their graph framework

Highlighted Use Cases

FINRA
- Detect illegal trading activity
- data source - 100 B events / day
- 30 PB of historical data
Drug Discovery - Astra Zeneca

Recommend new compounds to test using NLP, BERT, GNN.
Network Security - Apple
- Data source - logins, TCP, SSH
- Find security threats
- 100 TB / day
- 300 B events / day
- Leverage DeltaLake

Session 8 - What is the role of knowledge graphs in machine learning?

Speakers: Jure Leskovec, Luna Dong, Robert Hoffman

Jure

Discussion on resoning in KG using embeddings. KGs are heterogeneous graphs

Traditional tasks - Link prediction / KG completion

Obama born in US

Obama nationality?

learn projections, intersections & other operations.
Query2Box

Luna

Discussion on Amazon Product Graph

Use KG + ML for search, Alexa, product recommendations

Knowledge Extraction involves

Knowledge Alignment
Knowledge cleaning
Knowledge mining

Session 9 - What are some high value use cases of knowledge graphs?

Speakers: Jay Yu, Apoorv Saxena, David Newman

Jay

Discussion on KG at Intuit.

Use case of Tax programming using logic graph & KG.

Apoorv

Finance data is mostly RDBMS

Risk assessment with company KG

Use Cases:

Link Traversal - How much % revenue comes from Boeing?
Page Rank
Community Detection
Link Prediction
Graph embedding

Leveraging BERT to translate natural language to Cypher query

Session 10 - What are some open research questions on knowledge graphs?

Speakers: Richard Socher, Mark Musen, RV Guha

Richard

Deep Dive on real world use cases of KG with ML

Neural Tensor Network for knowledge base completion

How to think about multi hop link prediction / reasoning models ?

Mark

Contrarian view - What do KGs really know?

Spoke about MYCIN use case developed in 1970, which was SoTA back then.

Earlier known as semantic networks. 50 years fast forward we are at the same place.

Raghotham Sripadraj