.ASM files ignored? #6943
Replies: 2 comments 4 replies
-
You beat me to it. It's quite common for files that have come from Windows systems to be identified as binaries, but it's usually because of a byte order mark (BOM), which IIRC things like Visual Studio and other older dev tools add to files. This is a new one for me.
It's not up to Linguist. We use charlock_holmes to detect binaries: linguist/lib/linguist/blob_helper.rb Lines 122 to 142 in 39fd5e9 irb(main):001> require 'charlock_holmes'
=> true
irb(main):002> contents = File.read('ADVGRP.ASM')
=> "; [ This translation created 10-Feb-83 by Version 4.3 ]\r\n\r\n\t.RADIX 8\t\t; To be safe\r\n\r\nCSEG\tSEGMENT PUBLIC 'CODESG' \r\n\tASSUME CS:CSEG\r\n\r\nINCLUDE\tOEM.H\r\n\r\n\tTITLE ADVGRP - ADVANCE...
irb(main):003> CharlockHolmes::EncodingDetector.detect(contents)
=> {:type=>:binary, :confidence=>100}
irb(main):004> |
Beta Was this translation helpful? Give feedback.
-
Yes
Possibly. We'd have to be mindful of the risk of stripping legit uses (I have no idea if there are any). |
Beta Was this translation helpful? Give feedback.
-
I was looking at https://github.com/microsoft/GW-BASIC and it contains a lot of Assembly code (.ASM files).
However, when running github-linguist, it seems that those files are ignored. Am I missing something?
github-linguist output:
output for specific file:
Why is it detected as binary in the GW-BASIC repo and detected correctly in https://github.com/microsoft/MS-DOS?
EDIT: Turns out after opening the file in VS Code that there is a few NUL bytes at the end of the file.
Should Linguist ignore NULs at the end of a file?
Beta Was this translation helpful? Give feedback.
All reactions