Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent path behaviors when running diffs #581

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

egibs
Copy link
Member

@egibs egibs commented Nov 4, 2024

Closes: #565

This PR should fix the behavior noted in the aforementioned Issue. The root cause was a couple of things, but it was made more obvious when scanning /tmp paths on macOS because this path is symlinked to /private/tmp.

Knowing this, I updated the code to evaluate symlinks and also made the behavior when scanning directories versus files consistent (the /private/tmp -> /tmp symlink behavior was making the relative path incorrect).

Additionally, when passing in file paths specifically, the fromPath would match the scan path of files within the directory with the relative path product being .. If both the src and dest reports contain the . key, then it will appear as though completely unrelated files were actually modifications (which is incorrect).

By checking if fromPath is a directory or not, we can set the value of fromPath such that the relative path between it and files.Value.Path is essentially the filename which is stored as a key in the map we reference.

As a last resort, I set fromRoot to fromPath if the former evaluates to ..

A couple of examples with the fix:
Unrelated files in separate paths:

$ go run cmd/mal/mal.go diff ./out/samples-ec1ba5f2dc0e1f7085a0af73aa0f6fb1043e7534/javascript/clean/lottie-player.min.js /bin/bash
├─ 🟡 Deleted: lottie-player.min.js [MEDIUM]
│     ≡ data [LOW]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│            function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
│     ≡ operating-system [MEDIUM]
│       🟡 time/clock_sleep — uses setInterval to wait: setInterval(
│
├─ 🟡 Added: bash [MEDIUM]
│     ≡ credential [MEDIUM]
│       🟡 shell/bash_history — access .bash_history file
│     ≡ discovery [MEDIUM]
│       🟡 group/lookup — get entry from group database: endgrent, getgrent, setgrent
│       🟢 system/hostname_get — get computer host name: gethostname
│     ≡ evasion [MEDIUM]
│       🟡 hidden_files/var_tmp — path reference within /var/tmp: var/tmp/
│     ≡ execution [MEDIUM]
│       🟡 dylib/symbol_address — get the address of a symbol: dlsym
│       🟡 program — executes external programs: execve
│       🟢 program/background — wait for process to exit: waitpid
│       🟢 shell/SHELL — path to active shell: SHELL
│       🟢 shell/TERM — Look up or override terminal settings: TERM
│       🟡 shell/bash_dev_udp — uses /dev/udp for network access (bash)
│       🟡 shell/exec — executes shell: /bin/sh, /bin/zsh
│       🟡 tty/pathname — returns the pathname of a terminal device: ttyname
│     ≡ filesystem [MEDIUM]
│       🟢 file/delete — deletes files: unlink
│       🟢 link_read — read value of a symbolic link: readlink
│       🟢 path/etc — path reference within /etc: /etc/hosts, /etc/inputrc, /etc/profile
│       🟡 path/etc_hosts — references /etc/hosts
│       🟡 path/tmp — path reference within /tmp: /tmp/
│       🟡 path/usr_local — path reference within /usr/local/bin
│       🟢 path/var — path reference within /var:
│            /var/db/ManagedConfigurationFiles/com.apple.bash/etc/profile, /var/maiH, /var/mail, /var/tmp/
│       🟡 permission/modify — modifies file permissions: chmod
│       🟢 tempdir — looks up location of temp directory: TMPDIR
│       🟢 tempdir/tempfile_create — Uses mktemp to create temporary files: temp file
│     ≡ impact [MEDIUM]
│       🟡 remote_access/reverse_shell — references a reverse shell: /bin/sh, socket
│     ≡ networking [MEDIUM]
│       🟢 resolve/hostport_parse — Network address and service translation: freeaddrinfo, getaddrinfo
│       🟡 socket/connect — initiate a connection on a socket: _connect
│       🟡 socket/listen — listen on a socket: accept
│       🟢 socket/peer_address — get peer address of connected socket: getpeername
│       🟡 tcp/ssh — Uses SSH (secure shell) service
│     ≡ process [LOW]
│       🟢 chdir — changes working directory: cd which change the
│       🟢 create — create child process: _fork
│       🟢 executable_path — sets a custom PATH: /bin:/usr/, /usr/bin:/sbin
│       🟢 groupid_set — set real and effective group ID of process: setgid
│       🟢 parent_pid_get — gets parent process ID: getppid
│       🟢 userid_set — set real and effective user ID of current process: setuid

Same parent directory, related files, external scan:

$ go run cmd/mal/mal.go diff /tmp/old /tmp/new
├─ 🛑 Changed: /private/tmp/new/lottie-player.min.js [MEDIUM → CRITICAL]
│     ▲ anti-static [NONE → CRITICAL]
++      🟡 obfuscation/generic/hex_conversion — converts hex data to ASCII: toString("hex");
++      🟠 obfuscation/js/bitwise — uses an excessive amount of unsigned bitwise math:
++           a>>>0, a>>>11, a>>>13, a>>>15, a>>>16, a>>>22, a>>>24, a>>>25, a>>>26, a>>>31, a>>>32, a>>>6, a>>>8, b>>>0…
++      🟡 obfuscation/js/char_codes — obfuscated javascript that relies on character manipulation:
++           charAt, charCodeAt, const, fromCharCode, function(, length, push, shift, toString, {return
++      🟠 obfuscation/js/char_to_int — converts manipulated numbers into characters:
++           charAt(a>>>6*(3-l)&6, charAt(n%l),t--)}(, charAt(s>>>6*(3-a)&6, function(
++      🛑 obfuscation/js/ebe — highly obfuscated javascript (eBe):
++           charCodeAt, eBe(-1), eBe(-10), eBe(-11), eBe(-12), eBe(-13), eBe(-14), eBe(-15), eBe(-16), eBe(-17), eBe(-…
++      🟠 obfuscation/js/int_to_char — converts incremented numbers into characters:
++           charCodeAt(++a), charCodeAt(++d), charCodeAt(++n), charCodeAt(++s), function(
++      🟠 obfuscation/js/power — uses many powered array elements (>25):
++           charAt(a, charAt(c, charAt(n, charAt(s, charAt(t, charAt(u, charAt(w, e[0]^e[10], e[10]^e[20], e[11]^e[21]…
│     ▲ command & control [NONE → HIGH]
++      🟡 addr/ip — hardcoded IP address:
++           114.243.154.69, 13.182.181.343, 13.23.32.42, 14.22.33.243, 14.52.54.92, 146.288.257.686, 15.15.34.34, 15.2…
++      🟠 addr/url_unusual — Contains HTTP hostname with unusual top-level domain:
++           https://api.mantlescan.xyz/, https://mantlescan.xyz/, https://openchain.xyz/
│     ▲ credential [NONE → MEDIUM]
++      🟡 keychain — May access the macOS keychain
++      🟢 password — references a 'password': PasswordBasedCipher, to countless passwords
++      🟢 ssl/private_key — References private keys: privateKey
│     ▲ cryptography [NONE → MEDIUM]
++      🟢 aes — Supports AES (Advanced Encryption Standard)
++      🟡 blockchain — Uses a blockchain
++      🟢 ed25519 — Elliptic curve algorithm used by TLS and SSH: ed25519
++      🟢 hmac — Uses HMAC (Hash-based Message Authentication Code): HMAC.init
++      🟡 uuid — generates a random UUID: randomUUID
│     ▲ data [LOW → HIGH]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
++      🟠 builtin/appkit — Includes AppKit, a web3 blockchain library:
++           Price impact reflects the change in market price due to your trade, Select which chain to connect to your …
++      🟡 embedded/base64_url — Contains base64 url: odHRwOi8v::$http
++      🟢 encoding/base64 — Supports base64 encoded strings
++      🟢 encoding/qr_code — works with QR Codes
│     ▲ discovery [NONE → MEDIUM]
++      🟡 system/platform — get system identification: process.platform, process.versions
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│            function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ▲ exfiltration [NONE → CRITICAL]
++      🟡 stealer/browser — Uses HTTP, archives, and references multiple browsers:
++           .config, Brave, Chrome, Discord, Firefox, Opera, POST, Safari, https, zip
++      🟡 stealer/credit_card — references 'credit card'
++      🛑 stealer/wallet — makes HTTPS connections and references multiple wallets by name:
++           BraveWallet, CoinbaseBrowser, CoinbaseConnector, CoinbaseInjectedProvider, CoinbaseInjectedSigner, Coinbas…
│     ▲ filesystem [NONE → MEDIUM]
++      🟢 file/open — opens files: open(
++      🟢 mount — mounts file systems
++      🟡 path/relative — references and possibly executes relative path:
++           ./aes, ./blowfish, ./cipher-core, ./core, ./evpkdf, ./format-hex, ./hmac, ./lib-typedarrays, ./mode-cfb, .…
│     ▲ impact [NONE → MEDIUM]
++      🟡 remote_access/agent — references an 'agent': useragent
++      🟡 remote_access/heartbeat — references a 'heartbeat':
++           heartBeatTimeout, heartbeat_pulse, lastHeartbeatResponse, updateLastHeartbeat
++      🟡 resource/bank_xfer — references 'bank transfer'
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
++      🟡 http/form_upload — upload content via HTTP form: POST, application/json, application/x-www-form-urlencoded
++      🟡 http/post — submits content to websites: Content-Type, HTTP, POST, http
++      🟡 http/websocket — supports web sockets:
++           WalletLinkWebSocket, WebSocket:gV, WebSocket:typeof, WebSocketClass:h, WebSocketClass:l, clearWebSocket, w…
++      🟡 ip/addr — mentions an 'IP address': ipAddr
++      🟢 resolve/hostport_parse — Network address and service translation: getaddrinfo
++      🟡 socket/listen — listen on a socket: accept
++      🟢 socket/send — send a message to a socket: _send
++      🟡 url/encode — encodes URL, likely to pass GET variables: urlencode
++      🟡 url/request — requests resources via URL: requests.get(e)
++      🟡 webrtc — makes outgoing WebRTC connections, uses blockchain: RTCPeerConnection
│     ▲ operating-system [MEDIUM → LOW]
--      🟡 time/clock_sleep — uses setInterval to wait
++      🟢 env/get — Retrieve environment variable values: env.DEBUG, env.MODE, env.NEXT, env.NODE
++      🟢 fd/read — reads from a file handle: e.read()
++      🟢 fd/write — writes to a file handle:
++           a.write(o), decoder.write(n), decoder.write(t), e.write(t), i.write(e), t.write(o), this.write(e)
│

Same parent directory, related files, internal scan:

$HOME/go/1.23.2/bin/mal diff old new
├─ 🛑 Changed: new/lottie-player.min.js [MEDIUM → CRITICAL]
│     ▲ anti-static [NONE → CRITICAL]
++      🟡 obfuscation/generic/hex_conversion — converts hex data to ASCII: toString("hex");
++      🟠 obfuscation/js/bitwise — uses an excessive amount of unsigned bitwise math:
++           a>>>0, a>>>11, a>>>13, a>>>15, a>>>16, a>>>22, a>>>24, a>>>25, a>>>26, a>>>31, a>>>32, a>>>6, a>>>8, b>>>0…
++      🟡 obfuscation/js/char_codes — obfuscated javascript that relies on character manipulation:
++           charAt, charCodeAt, const, fromCharCode, function(, length, push, shift, toString, {return
++      🟠 obfuscation/js/char_to_int — converts manipulated numbers into characters:
++           charAt(a>>>6*(3-l)&6, charAt(n%l),t--)}(, charAt(s>>>6*(3-a)&6, function(
++      🛑 obfuscation/js/ebe — highly obfuscated javascript (eBe):
++           charCodeAt, eBe(-1), eBe(-10), eBe(-11), eBe(-12), eBe(-13), eBe(-14), eBe(-15), eBe(-16), eBe(-17), eBe(-…
++      🟠 obfuscation/js/int_to_char — converts incremented numbers into characters:
++           charCodeAt(++a), charCodeAt(++d), charCodeAt(++n), charCodeAt(++s), function(
++      🟠 obfuscation/js/power — uses many powered array elements (>25):
++           charAt(a, charAt(c, charAt(n, charAt(s, charAt(t, charAt(u, charAt(w, e[0]^e[10], e[10]^e[20], e[11]^e[21]…
│     ▲ command & control [NONE → HIGH]
++      🟡 addr/ip — hardcoded IP address:
++           114.243.154.69, 13.182.181.343, 13.23.32.42, 14.22.33.243, 14.52.54.92, 146.288.257.686, 15.15.34.34, 15.2…
++      🟠 addr/url_unusual — Contains HTTP hostname with unusual top-level domain:
++           https://api.mantlescan.xyz/, https://mantlescan.xyz/, https://openchain.xyz/
│     ▲ credential [NONE → MEDIUM]
++      🟡 keychain — May access the macOS keychain
++      🟢 password — references a 'password': PasswordBasedCipher, to countless passwords
++      🟢 ssl/private_key — References private keys: privateKey
│     ▲ cryptography [NONE → MEDIUM]
++      🟢 aes — Supports AES (Advanced Encryption Standard)
++      🟡 blockchain — Uses a blockchain
++      🟢 ed25519 — Elliptic curve algorithm used by TLS and SSH: ed25519
++      🟢 hmac — Uses HMAC (Hash-based Message Authentication Code): HMAC.init
++      🟡 uuid — generates a random UUID: randomUUID
│     ▲ data [LOW → HIGH]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
++      🟠 builtin/appkit — Includes AppKit, a web3 blockchain library:
++           Price impact reflects the change in market price due to your trade, Select which chain to connect to your …
++      🟡 embedded/base64_url — Contains base64 url: odHRwOi8v::$http
++      🟢 encoding/base64 — Supports base64 encoded strings
++      🟢 encoding/qr_code — works with QR Codes
│     ▲ discovery [NONE → MEDIUM]
++      🟡 system/platform — get system identification: process.platform, process.versions
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│            function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ▲ exfiltration [NONE → CRITICAL]
++      🟡 stealer/browser — Uses HTTP, archives, and references multiple browsers:
++           .config, Brave, Chrome, Discord, Firefox, Opera, POST, Safari, https, zip
++      🟡 stealer/credit_card — references 'credit card'
++      🛑 stealer/wallet — makes HTTPS connections and references multiple wallets by name:
++           BraveWallet, CoinbaseBrowser, CoinbaseConnector, CoinbaseInjectedProvider, CoinbaseInjectedSigner, Coinbas…
│     ▲ filesystem [NONE → MEDIUM]
++      🟢 file/open — opens files: open(
++      🟢 mount — mounts file systems
++      🟡 path/relative — references and possibly executes relative path:
++           ./aes, ./blowfish, ./cipher-core, ./core, ./evpkdf, ./format-hex, ./hmac, ./lib-typedarrays, ./mode-cfb, .…
│     ▲ impact [NONE → MEDIUM]
++      🟡 remote_access/agent — references an 'agent': useragent
++      🟡 remote_access/heartbeat — references a 'heartbeat':
++           heartBeatTimeout, heartbeat_pulse, lastHeartbeatResponse, updateLastHeartbeat
++      🟡 resource/bank_xfer — references 'bank transfer'
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
++      🟡 http/form_upload — upload content via HTTP form: POST, application/json, application/x-www-form-urlencoded
++      🟡 http/post — submits content to websites: Content-Type, HTTP, POST, http
++      🟡 http/websocket — supports web sockets:
++           WalletLinkWebSocket, WebSocket:gV, WebSocket:typeof, WebSocketClass:h, WebSocketClass:l, clearWebSocket, w…
++      🟡 ip/addr — mentions an 'IP address': ipAddr
++      🟢 resolve/hostport_parse — Network address and service translation: getaddrinfo
++      🟡 socket/listen — listen on a socket: accept
++      🟢 socket/send — send a message to a socket: _send
++      🟡 url/encode — encodes URL, likely to pass GET variables: urlencode
++      🟡 url/request — requests resources via URL: requests.get(e)
++      🟡 webrtc — makes outgoing WebRTC connections, uses blockchain: RTCPeerConnection
│     ▲ operating-system [MEDIUM → LOW]
--      🟡 time/clock_sleep — uses setInterval to wait
++      🟢 env/get — Retrieve environment variable values: env.DEBUG, env.MODE, env.NEXT, env.NODE
++      🟢 fd/read — reads from a file handle: e.read()
++      🟢 fd/write — writes to a file handle:
++           a.write(o), decoder.write(n), decoder.write(t), e.write(t), i.write(e), t.write(o), this.write(e)
│

Same parent directory, related files, internal scan, explicitly passing in files:

$HOME/go/1.23.2/bin/mal diff old/lottie-player.min.js new/lottie-player.min.js
├─ 🛑 Changed: new/lottie-player.min.js [MEDIUM → CRITICAL]
│     ▲ anti-static [NONE → CRITICAL]
++      🟡 obfuscation/generic/hex_conversion — converts hex data to ASCII: toString("hex");
++      🟠 obfuscation/js/bitwise — uses an excessive amount of unsigned bitwise math:
++           a>>>0, a>>>11, a>>>13, a>>>15, a>>>16, a>>>22, a>>>24, a>>>25, a>>>26, a>>>31, a>>>32, a>>>6, a>>>8, b>>>0…
++      🟡 obfuscation/js/char_codes — obfuscated javascript that relies on character manipulation:
++           charAt, charCodeAt, const, fromCharCode, function(, length, push, shift, toString, {return
++      🟠 obfuscation/js/char_to_int — converts manipulated numbers into characters:
++           charAt(a>>>6*(3-l)&6, charAt(n%l),t--)}(, charAt(s>>>6*(3-a)&6, function(
++      🛑 obfuscation/js/ebe — highly obfuscated javascript (eBe):
++           charCodeAt, eBe(-1), eBe(-10), eBe(-11), eBe(-12), eBe(-13), eBe(-14), eBe(-15), eBe(-16), eBe(-17), eBe(-…
++      🟠 obfuscation/js/int_to_char — converts incremented numbers into characters:
++           charCodeAt(++a), charCodeAt(++d), charCodeAt(++n), charCodeAt(++s), function(
++      🟠 obfuscation/js/power — uses many powered array elements (>25):
++           charAt(a, charAt(c, charAt(n, charAt(s, charAt(t, charAt(u, charAt(w, e[0]^e[10], e[10]^e[20], e[11]^e[21]…
│     ▲ command & control [NONE → HIGH]
++      🟡 addr/ip — hardcoded IP address:
++           114.243.154.69, 13.182.181.343, 13.23.32.42, 14.22.33.243, 14.52.54.92, 146.288.257.686, 15.15.34.34, 15.2…
++      🟠 addr/url_unusual — Contains HTTP hostname with unusual top-level domain:
++           https://api.mantlescan.xyz/, https://mantlescan.xyz/, https://openchain.xyz/
│     ▲ credential [NONE → MEDIUM]
++      🟡 keychain — May access the macOS keychain
++      🟢 password — references a 'password': PasswordBasedCipher, to countless passwords
++      🟢 ssl/private_key — References private keys: privateKey
│     ▲ cryptography [NONE → MEDIUM]
++      🟢 aes — Supports AES (Advanced Encryption Standard)
++      🟡 blockchain — Uses a blockchain
++      🟢 ed25519 — Elliptic curve algorithm used by TLS and SSH: ed25519
++      🟢 hmac — Uses HMAC (Hash-based Message Authentication Code): HMAC.init
++      🟡 uuid — generates a random UUID: randomUUID
│     ▲ data [LOW → HIGH]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
++      🟠 builtin/appkit — Includes AppKit, a web3 blockchain library:
++           Price impact reflects the change in market price due to your trade, Select which chain to connect to your …
++      🟡 embedded/base64_url — Contains base64 url: odHRwOi8v::$http
++      🟢 encoding/base64 — Supports base64 encoded strings
++      🟢 encoding/qr_code — works with QR Codes
│     ▲ discovery [NONE → MEDIUM]
++      🟡 system/platform — get system identification: process.platform, process.versions
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│            function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ▲ exfiltration [NONE → CRITICAL]
++      🟡 stealer/browser — Uses HTTP, archives, and references multiple browsers:
++           .config, Brave, Chrome, Discord, Firefox, Opera, POST, Safari, https, zip
++      🟡 stealer/credit_card — references 'credit card'
++      🛑 stealer/wallet — makes HTTPS connections and references multiple wallets by name:
++           BraveWallet, CoinbaseBrowser, CoinbaseConnector, CoinbaseInjectedProvider, CoinbaseInjectedSigner, Coinbas…
│     ▲ filesystem [NONE → MEDIUM]
++      🟢 file/open — opens files: open(
++      🟢 mount — mounts file systems
++      🟡 path/relative — references and possibly executes relative path:
++           ./aes, ./blowfish, ./cipher-core, ./core, ./evpkdf, ./format-hex, ./hmac, ./lib-typedarrays, ./mode-cfb, .…
│     ▲ impact [NONE → MEDIUM]
++      🟡 remote_access/agent — references an 'agent': useragent
++      🟡 remote_access/heartbeat — references a 'heartbeat':
++           heartBeatTimeout, heartbeat_pulse, lastHeartbeatResponse, updateLastHeartbeat
++      🟡 resource/bank_xfer — references 'bank transfer'
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
++      🟡 http/form_upload — upload content via HTTP form: POST, application/json, application/x-www-form-urlencoded
++      🟡 http/post — submits content to websites: Content-Type, HTTP, POST, http
++      🟡 http/websocket — supports web sockets:
++           WalletLinkWebSocket, WebSocket:gV, WebSocket:typeof, WebSocketClass:h, WebSocketClass:l, clearWebSocket, w…
++      🟡 ip/addr — mentions an 'IP address': ipAddr
++      🟢 resolve/hostport_parse — Network address and service translation: getaddrinfo
++      🟡 socket/listen — listen on a socket: accept
++      🟢 socket/send — send a message to a socket: _send
++      🟡 url/encode — encodes URL, likely to pass GET variables: urlencode
++      🟡 url/request — requests resources via URL: requests.get(e)
++      🟡 webrtc — makes outgoing WebRTC connections, uses blockchain: RTCPeerConnection
│     ▲ operating-system [MEDIUM → LOW]
--      🟡 time/clock_sleep — uses setInterval to wait
++      🟢 env/get — Retrieve environment variable values: env.DEBUG, env.MODE, env.NEXT, env.NODE
++      🟢 fd/read — reads from a file handle: e.read()
++      🟢 fd/write — writes to a file handle:
++           a.write(o), decoder.write(n), decoder.write(t), e.write(t), i.write(e), t.write(o), this.write(e)
│

An external scan passing in explicit file names results in the same output as the internal scan.

@egibs egibs requested a review from tstromberg November 4, 2024 22:15
@@ -99,6 +120,52 @@ func processSrc(ctx context.Context, c malcontent.Config, src, dest map[string]*
}
}

func processDest(ctx context.Context, c malcontent.Config, from, to map[string]*malcontent.FileReport, d *malcontent.DiffReport) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these functions around to make them easier to reference. They're coupled enough that jumping back and forth was a little painful.

if !behaviorExists(tb, fr.Behaviors) {
tb.DiffAdded = true
abs.Behaviors = append(abs.Behaviors, tb)
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added this continue.

@egibs
Copy link
Member Author

egibs commented Nov 4, 2024

Still seeing some odd diffs in the test data. Working through those now.

@egibs egibs marked this pull request as draft November 4, 2024 22:25
@egibs
Copy link
Member Author

egibs commented Nov 5, 2024

@tstromberg -- let me know whether you think the updated test data is accurate. Most of the files we diff have entirely different names so the deleted/added behavior seems correct to me.

Since we don't run the levenshtein distance for any non-JSON/SPDX files, we can't really expect files like ls and ls.x86_64 to count as moves.

@egibs egibs marked this pull request as ready for review November 5, 2024 16:51
@egibs egibs merged commit a154e05 into chainguard-dev:main Nov 5, 2024
8 checks passed
@egibs egibs deleted the fix-diff-behavior branch November 5, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

diff broken: considers two files or directories as delete+add rather than modify
2 participants