Raghotham Sripadraj

23 Jul 2020

Knowledge Graph Notes

How should AI explicitly represent knowledge?

My notes from CS520 - Knowledge Graph Seminar

Session 1 - What is Knowledge Graph?

Speakers: Denny Vrandečić, Jans Aasman, Mikhail Galkin

Denny

Denny speaks about knowledge graphs (KGs) are used for web search, question answer systems, data integration systems.

Specific use case of Wiki Data

  • KG built on 80M nodes, 1B edges
  • Uses RDF open format (w3c)
  • Schema.org annotation
  • SPARQL for querying

Following the open format has great advantages. One example stated was how wikipedia can leverage other services which expose data. Wikipedia can run SPARQL to query OpenStreet Maps data - pick lat, long from OSM for ATMs in Munich for a specific bank network.

Jans

Discussion around modern KGs:

  • Semantic graph DB
  • Ontologies & Taxonomies
  • Rule based processing
  • ML & NLP based processing

AllegroGraph's KG implementation:

  1. Document / NLP - Chomsky KG

    The Noam Chomsky Knowledge Graph will link to over 1,000 articles & over 100 books that Chomsky has authored about linguistics, mass media, politics & war.

  2. Event Based for Motefiore Health Care

    Montefiore’s Patient-centered Analytical Learning Machine (PALM), a machine learning platform built from the ground up to predict & prevent life-threatening medical conditions & minimize wait times.

Mikhail

  • Think of KG as world models in terms of entities & relations
  • Encoding can be based on different representations

Representations:

  1. Symbolic - Logic, DB

    • Store as triples

      Subject-Predicate-Object
      
      Rajakumara starring Puneet
      
      Puneet born Chennai
      
  2. Vector - NLP, Computer Vision

    • Embeddings - Leverage high dimensional space & a function to group similar things nearby

KGs can be viewed from different Point of Views:

  • Logic programming, RDF way

  • RDMS - entities are cells, relations are columns

  • Computer Vision - CNN + RPN ⇒ graph inference

  • NLP

    • Knowledge Graph ⇒ Named Entity Recognition

    • Information Retrieval ⇒ Relation Linking

    • Unstructured Sources ⇒ Question Answer System

    • Language Models


Session 2 - How to create Knowledge Graph?

Speakers: Juan Sequeda, Chris Ré, Xiao Ling

Xiao Ling

Discussion on how Siri Knowledge is built. It is based on triples - subject, predicate, object.

Sources are:

  1. Unstructured text articles
  2. Semi structured
  3. Structured features
  4. Human Curated

All these are fused together to build the Siri KG. Techniques used - Info Box extraction from wikipedia, Entity Resolution.

Challenges

  • Fields do not match

    Date of Birth, DoB, Birth Date - all of them mean the same

Built:

  • RIBE - Robust Info Box Extraction (HTML input ⇒ Triple as output)
  • Candidate Extraction Models
  • Entity Linking Models
  • Entity Resolution Models

Session 3 - What are some advanced knowledge graph?

Speakers: Mike Tung, Cogan Shimizu, Marie-Laure Mugnier

Mike

DiffBot KG built on full public web

Their pipeline looks like this

  1. Page type classification - classify page type & language
  2. Visual Extraction - extract product information, metadata links, images, price
  3. NLU - language detection, enetity detection
  4. Record Linking

Session 4 - What are some knowledge graph inference algorithms?

Speakers: An Hai Doan, Yuxiao Dong, Georg Gottlob

An Hai Doan

Discussion on Entity Matching use case - The Magellan Project

Entity matching steps

  • Blocking - reduce number of pair comparisons
  • Matching - Rule based / ML based

Yuxiao

Discussion on Microsoft Academic Graph (MAG)

Leverages heterogenous graph transformer

George

Mostly spoke about VADALOG


Session 5 - How to evolve a knowledge graph?

Speakers: Héctor Pérez-Urbina, José Manuel Gómez-Pérez, Mike Uschold

Hector

How to model a dynamic world?

Example of ambiguity - Vocaloids. Not only humans are artists but anime characters too.

Modifying a KG is far easier than modifying RDBMS. Easy to change → Add properties.

Evolution of KG

  • Can UI still work with the change?
  • Can all downstream applications work?
  • Schema validation

Test, Test, Test!

Jose

Discussion on KG for NLP

  • ML driven NLP
    • Pros - flexible, SoTA, broad
    • Cons - black box, lack real world understanding
  • KG based NLP
    • Pros - curated, logical graph, no training, rich & deep representation
    • Cons - rigid, brittle, expensive manual curation

Real world use cases:

  1. COGITO - Expert NLP based on KG
    • Sentence split / parse
    • Morphological analysis
    • Sentence / logical / grammar analysis
    • Semantic analysis / disambiguation
  2. Vecsigrafo
    • Learns word & concept embeddings in shared space
    • Combines corpus based & graph based knowledge to build word representation
    • Uses & extends swivel algorithm
  3. Transigrafo - Transformers + KG

Mike

  • Get rid of data silos with KG
  • Use triple stores instead of RDBs
  • ALWAYS HAVE A SCHEMA!
  • Use SHACL - SHApes Constrains Language
  • Use OWL + SHACL
  • Build 1 enterprise ontology

Session 6 - How do users interact with knowledge graphs?

Speakers: Amit Prakash, Chaomei Chen, Leilani Gilpin

Amit

Discussion on work at ThoughtSpot

  • ThoughtSpot success is mostly attributed to great UX. They took a year to crack the UX.
  • Stick to simple algorithms, they work 80% of the time
  • Figure out ways to collect data, labels with simple algorithms based on user feedback & interaction
  • Figure out success KPIs

Chaomei

Discussion on work at Drexel University

  • KG on top of research papers & citations
  • Temporal movement analysis
  • Pay attention to relation across domains. Look for bridges across clusters & their importance

Leilani

Discussion on explaining explanations

  • Explainability ≠ Interpretibility
  • Interpretable ⇒ understandable to humans
  • Completeness ⇒ describe operations in an accurate way
  • Explanation needs needs to be both interpretable & complete

Session 7 - What are some prevalent graph engines in industry?

Speakers: Philip Rathle, Brad Bebee, Matei Zaharia

Philip

  • Showcased neo4j & customers - NASA, eBay, DZD

  • Spoke about property Graph way of modelling

Brad

Showcased AWS Neptune

Matei

Showcased Databricks & their graph framework

Highlighted Use Cases

  1. FINRA

    • Detect illegal trading activity
    • data source - 100 B events / day
    • 30 PB of historical data
  2. Drug Discovery - Astra Zeneca

    Recommend new compounds to test using NLP, BERT, GNN.

  3. Network Security - Apple

    • Data source - logins, TCP, SSH
    • Find security threats
    • 100 TB / day
    • 300 B events / day
    • Leverage DeltaLake

Session 8 - What is the role of knowledge graphs in machine learning?

Speakers: Jure Leskovec, Luna Dong, Robert Hoffman

Jure

Discussion on resoning in KG using embeddings. KGs are heterogeneous graphs

  • Traditional tasks - Link prediction / KG completion

    Obama born in US

    Obama nationality?

    learn projections, intersections & other operations.

  • Query2Box

Luna

Discussion on Amazon Product Graph

Use KG + ML for search, Alexa, product recommendations

Knowledge Extraction involves

  • Knowledge Alignment
  • Knowledge cleaning
  • Knowledge mining

Session 9 - What are some high value use cases of knowledge graphs?

Speakers: Jay Yu, Apoorv Saxena, David Newman

Jay

Discussion on KG at Intuit.

Use case of Tax programming using logic graph & KG.

Apoorv

Finance data is mostly RDBMS

Risk assessment with company KG

Use Cases:

  • Link Traversal - How much % revenue comes from Boeing?
  • Page Rank
  • Community Detection
  • Link Prediction
  • Graph embedding

Leveraging BERT to translate natural language to Cypher query


Session 10 - What are some open research questions on knowledge graphs?

Speakers: Richard Socher, Mark Musen, RV Guha

Richard

Deep Dive on real world use cases of KG with ML

Neural Tensor Network for knowledge base completion

How to think about multi hop link prediction / reasoning models ?

Mark

Contrarian view - What do KGs really know?

Spoke about MYCIN use case developed in 1970, which was SoTA back then.

Earlier known as semantic networks. 50 years fast forward we are at the same place.