Graph databases are very useful when we try to represent data in a friendly visual way. One of the best use cases i can think of for a graph database is to visualise the libraries used by a project and the vulnerabilities (if any) in those libraries. In short, a software bill of materials.
The code for this project is available in my personal github repo
The database we will use here is Neo4J, so the first thing we need is to install it and create a database. There are plenty of tutorials in their official site, so if you are not familiar with it, this is a good starting point.
We also need Dependency-Check installed. This is a tool from OWASP to scan opensource libraries. We will use as an input the report generated by Dependency-Check and ingest it in our database, but this solution can be adapted to use it with any other tool.
The idea behind this is really simple. We have a json report previously generated by Dependency-Check. This report contains all the libraries and vulnerabilities for the project scanned. We parse this report, extract the relevant information and ingest it in our database to visualise it. Lets go a bit more in deep into that:
We will have three different sets of data to ingest in the database. A project, a dependency and a vulnerability. This is the information we will store for each of them:
- project_name: The name of the project we have scanned
- project_name: The list of projects where this dependency is included
- dependency: The name and version of the dependency
- package: The technology used for that dependency (Maven, npm, etc…)
- vulnerabilities: The list of CVE or identifiers for the vulnerabilities in that dependency
- vulnerability_name: The identifier for the vulnerability
- severity: The CVSS score for this vulnerability
This is just the data to be ingested. In graph databases, on top of the data, we need to create relations:
- We create this relation when a project name exists in the list of projects for a given dependency
- We create this relation too when a vulnerability exists in the list of vulnerabilities for a given dependency.
So, our structure looks like this:
Once we have explained the model, it is time to run the tool and ingest data in our database. This step couldn’t be simpler.
You just need to configure as enviromental variables the configuration for the database:
And then, run the python script in the git repository, sending as parameters the name of the project to be ingested and the path to the json report
python ingest_data_neo4j.py testjavi myreport.json
And that’s it! Now, we have our data ingested in Neo4J.
Finally, it’s time to visualise the data we have ingested in the Neo4J Browser . We can visualise different things here, but i will leave the queries for the data that i find more useful:
List of dependencies for a given project
MATCH(a:dependency), (m:project) WHERE m.project_name=’testjavi’ RETURN a,m
List of projects that use a given dependency
MATCH(a:dependency), (m:project) WHERE a.dependency=’email@example.com.RELEASE’ RETURN a,m
List of all the dependencies and vulnerabilities in a project
MATCH(a:dependency), (m:project), (v:vulnerability) WHERE m.project_name=’testjavi’ RETURN a,m,v
And finally, list only the dependencies that contain vulnerabilities, and which projects use it
MATCH(a:dependency), (m:project) WHERE a.vulnerabilities<> RETURN a,m
What we have seen here is just an example about how to ingest a report from Dependency-Check, but it can be adapted to any Software Composition Analysis tool and how to visualise that data
I hope you found this article useful!