National Software Reference Library (NSRL): The NIST Standards for a Reference Data Set

NIST are well known for defining standards for a range of technologies which have since become a de-facto standard. This includes defining…

National Software Reference Library (NSRL): The NIST Standards for a Reference Data Set

NIST are well known for defining the standards for a range of technologies and which have since become a de-facto standard. This includes defining the AES and the SHA-3 hashing methods. Currently, they have are two open competitions which will be closing soon: PQC (Post Quantum Cryptography) and LWC (Light-weight cryptography).

But, what about digital forensics tools? Surely there must be some standards related to the trustworthiness of the tools that are used, as their evidence could be used in a court of law. Well, our spin-out company — Cyacomb — identified many weaknesses in the current approaches to using hash values for the identification of contraband, and especially in the type of hashing methods used, and in the time it took to scan disks (which can take up to a day to archive and search). They are currently leading the way in fast and efficient searching methods for contraband content:

Ref: here

The National Software Reference Library

And so, NIST created the National Software Reference Library (NSRL), and which contains a range of hashes which are “known” to link to malicious content. Overall, this is defined as the Reference Data Set (RDS), and contains information that is used by many law enforcement and government agencies. Basically, it matches the MD5 and SHA-1 hashes or the CRC-32 checksum to possible criminal activity. It should be noted that these are related to the presence of software applications and associated data elements that are considered malicious. This includes hacking scripts and obfuscation tools. It should also be noted that there are no hash values of illicit data — this type of dataset is supported by other agencies.

And so, NIST has now eleased NSRL Version 3.0, and which has improved performance in searching and which has increased the dataset size to over a billion entries. It also includes the support for the SQLite format, and which allows for improved usage of filters for the search.

The dataset format defines the digital fingerprint elements of SHA-1 (a 160-bit hash), MD5 (a 128-bit hash) and CRC-32 (a 32-bit checksum) , along with the filename, file size and product code:

"SHA-1", "MD5", "CRC32", "FileName", "FileSize", "ProductCode", "OpSystemCode", "SpecialCode"
"0000001FFEF4BE312BAB534ECA7AEAA3E4684D85","344428FA4BA313712E4CA9B16D089AC4","7516A25F",".text._ZNSt14overflow_errorC1ERKSs",33,219181,"362",""
"00000052A9EEEC6C8348CFB2AEA77BC1FBF8D239","F46CA74CA3D89E9D3CF8D8E5CD77842D","2F9CC135","__DATA__mod_init_func",772,218747,"362",""
"00000079FD7AAC9B2F9C988C50750E1F50B27EB5","8ED4B4ED952526D89899E723F3488DE4","7A5407CA","wow64_microsoft-windows-i..timezones.resources_31bf3856ad364e35_10.0.16299.579_de-de_f24979c73226184d.manifest",2520,190718,"362",""
"000000F694CA9BF73836D67DEB5E2724338B422D","497C460BBA43530494F37DF7DE3A5FF4","46B80AC7","bpa10x.ko",12944,17066,"362",""
"00000114EEAA69CF30652FFA459D9E167B132C06","7C36BE0D2BF2520D564D36C6F4241B4F","66E07FC3",".text",1130496,223308,"362","""0000011CF33DF2A2A10EF407E70912DC55F50C49","EAEB051BACDB9D67605659E3DF80C48C","74F27585","package_3482_for_kb4462939~31bf3856ad364e35~amd64~~10.0.1.5.cat",10660,204580,"362",""
"00000141082D416B7909164C580751808E7C11E4","E7990319759290BB6E0D17D7C685D203","F6A2F49D","ultoa.o",692,220911,"362",""
"00000178E84480AF35A484E8EC71B6C591B38507","9A872042A9CD96B4FB13901000B91982","97D3B7E8","microsoft-windows-internet-browserppipro-package~31bf3856ad364e35~x86~sl-si~10.0.19041.906.cat",8897,236113,"362",""
"000001BB80E9C6F9CACB6DA82F4D2E3266B9C4C3","3491EE38124BF5382D0828C5209C83B5","6CC040F2","Batman_Seventies.POR",90,196184,"362",""
"0000030F6D93EC90BDEA54B08BF7B512B13F55F9","CC6B8BA59F74F251DBCA14962A156C9D","ECEDDFD8",".rodata",173816,220501,"362",""
"000003191A199BFA961C18A6F71FF2ED04D0F9DA","84B2CE4DC226E61470EC240593CCBFF3","CC6201BD",".rdata",5120,221574,"362",""
"0000034F77D9314B1B94DBDA3031BECE1198D067","FE330C56554EF007D38C89764864E365","71C6F991","arm64_49016ecbe73216140477e3b16492e87f_31bf3856ad364e35_10.0.17134.81_none_ae8f44b72b46370a.manifest",705,188511,"362",""
"000003802D91BC41F5C89BB6115903ABC35372AB","F85BA698CA9E66D39BA8E223602E136E","41195B49",".gnu.version",192,226190,"362",""
"000005BD2D48EAA23C01EE5561B9C6F164E89B41","858DEA54B3CBE4664F6652C37180A8AE","210F55CB","ScBrPls1.A05D7955_E27E_48E7_843F_456A4A59DC3A",456632,226257,"362",""
"000005D7D418D463A849D0FD6AB01A1982D6C8D6","0DD50DF49C7E9C01B97038FAE5A077E1","7B608B44",".text",5460480,182055,"362","""000005D7F73491D207C9B34EC0D3720B2767CE93","849C766653FB4C4C6E9727175FE4974B","16C39D0D",".rela.rodata",23328,263769,"362",""

This file can be easily integrated into a spreadsheet, or used in a free-form search:

If we search for “edinburgh”, we get the first few entries of:

"00979D99B3E588C4F2DC3ACC06688911A5ADDF7D","104E3322D11A6CBFE68C30619958E010","2B24A68B","SUNRISE_edinburgh_roll_up_01.fcr",15079,234558,"362",""
"031EFB8F57130B2B69F6A59036A81A60F52705E1","9D19BBB753E7C747AFD49AECFD48423A","27CA4E07","PGC0567_beautyspot_edinburgh_01.xml",15549,234558,"362",""
"038CC36BA931F8D7F7A565D64D79550E3B1ECB06","0EAFF011B221ADEAF141A2758F693AA4","3B1DADC9","MS_START_edinburgh_hairpin_05.fcr",11547,234558,"362",""
"03C67D7C38D4848326F6EEF8652B5C4773150F93","838C4DC0F2326B27091976F35E138370","AFC47DED","PGC0830_player_house_edinburgh_castle_beauty_shot_01.xml",13716,234558,"362",""
"0725329609E2BE5AA03F07A6394A3071F5B09600","C0758D85BC8F1EF9F44311E65B68289B","1F1584B8","MS_START_edinburgh_hairpin_01.fcx",5515,234558,"362",""
"07A1709AF274DE310487A0B6FB26BD98664FCA0E","6EECD945C14D21920F965111D6D9FF6D","30B0EFC4","SUNRISE_edinburgh_castle_roll_up.fcr",18197,234558,"362",""
"08CADC4DFCEC70364F8941F88AC32FED0CD2F411","7925DF4270281101204303FF808617FD","7832C34A","MS_START_edinburgh_hairpin_07.fcx",4631,234558,"362",""
"0A5992F39416C0C7E46FF2F6E04EA717F057EF7B","9A756EDC615A331481AF0C1CC46A3ED1","7517C796","MS_START_edinburgh_hairpin_08.fcx",4495,234558,"362",""

In this case, we see the file of “SUNRISE_edinburgh_roll_up_01.fcr”, and which is likely to be a compressed Torrent file. And here is the start of the “Bitcoin” file search:

And for “scotland”, hopefully you won’t have this verison of the “scotlandkiltandpipes.png” graphic and with an MD5 hash of “02CDD9ED97ECBB5B5C5EC4EE22352317” on your machine:

One think to notice here, is that the SHA-1 hashes are ordered. This makes it easier to search for a given SHA-1 hashes, as we can go half way into the search, and find out if our hash is greater than or less than the value we have, and then go half again, and so on. This method is defined as a binary search, and means that for four billion hash values, we can find our value within 32 searches.

Conclusions

There are lots of applications for the dataset. You can get it here:

While MD5 can easily give hash collisions, they are much less likely with SHA-1, but this may become possible in a few years. And, so, I hope law enforcement can start to move toward using SHA-256 hashes, and which will not produce hash collision.