Knowledge Graph Notes
How should AI explicitly represent knowledge?
My notes from CS520 - Knowledge Graph Seminar
Session 1 - What is Knowledge Graph?
Speakers: Denny Vrandečić, Jans Aasman, Mikhail Galkin
Denny
Denny speaks about knowledge graphs (KGs) are used for web search, question answer systems, data integration systems.
Specific use case of Wiki Data
- KG built on 80M nodes, 1B edges
- Uses RDF open format (w3c)
- Schema.org annotation
- SPARQL for querying
Following the open format has great advantages. One example stated was how wikipedia can leverage other services which expose data. Wikipedia can run SPARQL to query OpenStreet Maps data - pick lat, long from OSM for ATMs in Munich for a specific bank network.
Jans
Discussion around modern KGs:
- Semantic graph DB
- Ontologies & Taxonomies
- Rule based processing
- ML & NLP based processing
AllegroGraph's KG implementation:
-
Document / NLP - Chomsky KG
The Noam Chomsky Knowledge Graph will link to over 1,000 articles & over 100 books that Chomsky has authored about linguistics, mass media, politics & war.
-
Event Based for Motefiore Health Care
Montefiore’s Patient-centered Analytical Learning Machine (PALM), a machine learning platform built from the ground up to predict & prevent life-threatening medical conditions & minimize wait times.
Mikhail
- Think of KG as world models in terms of entities & relations
- Encoding can be based on different representations
Representations:
-
Symbolic - Logic, DB
-
Store as triples
Subject-Predicate-Object Rajakumara starring Puneet Puneet born Chennai
-
-
Vector - NLP, Computer Vision
- Embeddings - Leverage high dimensional space & a function to group similar things nearby
KGs can be viewed from different Point of Views:
-
Logic programming, RDF way
-
RDMS - entities are cells, relations are columns
-
Computer Vision - CNN + RPN ⇒ graph inference
-
NLP
-
Knowledge Graph ⇒ Named Entity Recognition
-
Information Retrieval ⇒ Relation Linking
-
Unstructured Sources ⇒ Question Answer System
-
Language Models
-
Session 2 - How to create Knowledge Graph?
Speakers: Juan Sequeda, Chris Ré, Xiao Ling
Xiao Ling
Discussion on how Siri Knowledge is built. It is based on triples - subject, predicate, object.
Sources are:
- Unstructured text articles
- Semi structured
- Structured features
- Human Curated
All these are fused together to build the Siri KG. Techniques used - Info Box extraction from wikipedia, Entity Resolution.
Challenges
-
Fields do not match
Date of Birth, DoB, Birth Date - all of them mean the same
Built:
- RIBE - Robust Info Box Extraction (HTML input ⇒ Triple as output)
- Candidate Extraction Models
- Entity Linking Models
- Entity Resolution Models
Session 3 - What are some advanced knowledge graph?
Speakers: Mike Tung, Cogan Shimizu, Marie-Laure Mugnier
Mike
DiffBot KG built on full public web
Their pipeline looks like this
- Page type classification - classify page type & language
- Visual Extraction - extract product information, metadata links, images, price
- NLU - language detection, enetity detection
- Record Linking
Session 4 - What are some knowledge graph inference algorithms?
Speakers: An Hai Doan, Yuxiao Dong, Georg Gottlob
An Hai Doan
Discussion on Entity Matching use case - The Magellan Project
Entity matching steps
- Blocking - reduce number of pair comparisons
- Matching - Rule based / ML based
Yuxiao
Discussion on Microsoft Academic Graph (MAG)
Leverages heterogenous graph transformer
George
Mostly spoke about VADALOG
Session 5 - How to evolve a knowledge graph?
Speakers: Héctor Pérez-Urbina, José Manuel Gómez-Pérez, Mike Uschold
Hector
How to model a dynamic world?
Example of ambiguity - Vocaloids. Not only humans are artists but anime characters too.
Modifying a KG is far easier than modifying RDBMS. Easy to change → Add properties.
Evolution of KG
- Can UI still work with the change?
- Can all downstream applications work?
- Schema validation
Test, Test, Test!
Jose
Discussion on KG for NLP
- ML driven NLP
- Pros - flexible, SoTA, broad
- Cons - black box, lack real world understanding
- KG based NLP
- Pros - curated, logical graph, no training, rich & deep representation
- Cons - rigid, brittle, expensive manual curation
Real world use cases:
- COGITO - Expert NLP based on KG
- Sentence split / parse
- Morphological analysis
- Sentence / logical / grammar analysis
- Semantic analysis / disambiguation
- Vecsigrafo
- Learns word & concept embeddings in shared space
- Combines corpus based & graph based knowledge to build word representation
- Uses & extends swivel algorithm
- Transigrafo - Transformers + KG
Mike
- Get rid of data silos with KG
- Use triple stores instead of RDBs
- ALWAYS HAVE A SCHEMA!
- Use SHACL - SHApes Constrains Language
- Use OWL + SHACL
- Build 1 enterprise ontology
Session 6 - How do users interact with knowledge graphs?
Speakers: Amit Prakash, Chaomei Chen, Leilani Gilpin
Amit
Discussion on work at ThoughtSpot
- ThoughtSpot success is mostly attributed to great UX. They took a year to crack the UX.
- Stick to simple algorithms, they work 80% of the time
- Figure out ways to collect data, labels with simple algorithms based on user feedback & interaction
- Figure out success KPIs
Chaomei
Discussion on work at Drexel University
- KG on top of research papers & citations
- Temporal movement analysis
- Pay attention to relation across domains. Look for bridges across clusters & their importance
Leilani
Discussion on explaining explanations
- Explainability ≠ Interpretibility
- Interpretable ⇒ understandable to humans
- Completeness ⇒ describe operations in an accurate way
- Explanation needs needs to be both interpretable & complete
Session 7 - What are some prevalent graph engines in industry?
Speakers: Philip Rathle, Brad Bebee, Matei Zaharia
Philip
-
Showcased neo4j & customers - NASA, eBay, DZD
-
Spoke about property Graph way of modelling
Brad
Showcased AWS Neptune
Matei
Showcased Databricks & their graph framework
Highlighted Use Cases
-
FINRA
- Detect illegal trading activity
- data source - 100 B events / day
- 30 PB of historical data
-
Drug Discovery - Astra Zeneca
Recommend new compounds to test using NLP, BERT, GNN.
-
Network Security - Apple
- Data source - logins, TCP, SSH
- Find security threats
- 100 TB / day
- 300 B events / day
- Leverage DeltaLake
Session 8 - What is the role of knowledge graphs in machine learning?
Speakers: Jure Leskovec, Luna Dong, Robert Hoffman
Jure
Discussion on resoning in KG using embeddings. KGs are heterogeneous graphs
-
Traditional tasks - Link prediction / KG completion
Obama born in US
Obama nationality?
learn projections, intersections & other operations.
-
Query2Box
Luna
Discussion on Amazon Product Graph
Use KG + ML for search, Alexa, product recommendations
Knowledge Extraction involves
- Knowledge Alignment
- Knowledge cleaning
- Knowledge mining
Session 9 - What are some high value use cases of knowledge graphs?
Speakers: Jay Yu, Apoorv Saxena, David Newman
Jay
Discussion on KG at Intuit.
Use case of Tax programming using logic graph & KG.
Apoorv
Finance data is mostly RDBMS
Risk assessment with company KG
Use Cases:
- Link Traversal - How much % revenue comes from Boeing?
- Page Rank
- Community Detection
- Link Prediction
- Graph embedding
Leveraging BERT to translate natural language to Cypher query
Session 10 - What are some open research questions on knowledge graphs?
Speakers: Richard Socher, Mark Musen, RV Guha
Richard
Deep Dive on real world use cases of KG with ML
Neural Tensor Network for knowledge base completion
How to think about multi hop link prediction / reasoning models ?
Mark
Contrarian view - What do KGs really know?
Spoke about MYCIN use case developed in 1970, which was SoTA back then.
Earlier known as semantic networks. 50 years fast forward we are at the same place.