CVE WebCrawler :spider_web:

CVE WebCrawler :spider_web:

- 1 min

Summary

The CVE WebCrawler is a project I worked on during my 2016 summer internship at Fasoo in Seoul, Korea. This Webcrawler takes the xml files provided from the Common Vulnerabilities and Exposures (CVE) website, and outputs a database file that contained the urls that contain vulnerability related source codes categorized by vulnerability types.


Background

Fasoo is a software company that provides unstructured data security and enterprise document platforms. Fasoo’s products include Wrapsody, Sparrow, and other. I worked as part of the Team developing Sparrow, a Static Program Analysis Program. In order to analyze the code and find possible vulnerabilities, the team needed code snippets that were vulnerable. My job was to develop a crawler that would obtain code snippets from the web.

Challenges

There were several challenges in this project, with the largest one being distinguishing plain text from source codes. This was done through several regular expressions that would match with common code structures from various languages. These regular expressions were tuned over few weeks to remove every false positive, as we focused on removing false positive over false negatives. The tuning process took a very long time, as running the program took several hours, and we could only find false positive by manually checking all URLs that were identified to contain code snippets, which was also extremely time consuming.

Markdown Image

Program Running

Result

The project result was successful in that we were able to develop a crawler as we had intended with an extremely low false positive rate (0 when last tested) The program then categorized the URLs with code snippets by the type of vulnerability and saved the output in a sqlite file.

Markdown Image

Output saved in sqlite
Daniel Choi

Daniel Choi

A developer who loves coffee

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora