Skip to content

Commit

Permalink
fix(5xx): should return 4xx when file not found (#1291)
Browse files Browse the repository at this point in the history
## Problem

Described in linear: 
missing media files in pages leads to a 5xx when the resource name itself cannot be processed

as an example, consider a link (some_link)[/%2fa/b] -> this resolves to //a/b, which is not processable by our backend as it results in an error like so: Error when getting Git blob hash: Error: HEAD:images/technews//tvws-part2.JPG  and in turn causes a 5xx. 

this 5xx is caused because GitFileService (iirc) will try to determine the blob hash (see if can find the file on disk) and in doing so, it will get an error (as the file doesn't exist), leading to the 500.  

see here for an example where this occurs

in order to reproduce this: 
- [ ] go to any site with an image
- [ ] create or click on an existing markdown page
- [ ] in the page, add an image
- [ ] open network console
- [ ] update the link to be /images/%2f<existing>
- [ ] there should be a 5xx in the network console.


Additionally, we also see a `fatal: path '.git' does not exist in 'HEAD'` in the logs every time the workspace is refreshed. 
This is because we rely on `fileStats` to get a list of directories, but forget to remove special directories such as `.git`. 
## Solution

Reason for this is that git under the hood does a file path normalision for checking if the file exists in local disk, but does not when checking when it exists in head. 

consider these three examples:

<img width="651" alt="Screenshot 2024-04-11 at 12 58 41 PM" src="https://github.com/isomerpages/isomercms-backend/assets/42832651/0c4e39f6-cd5f-428f-9849-92adacb41998">

We use fileStats to see if a file exists before passing it to `GetGitBlobHash`. issue with this is that while the file could exist in disk and still return a correct folder stats, this should still return a file not found error as the path given by the user is still incorrect. (am opting for this rather than normalising the path in the function `GetGitBlobHash` to keep the implementation of `GetGitBlobHash` lean and not think about potential path traversal attacks.)

for the special files, we simply do a filter to ignore files like `.git`. We know that the class of errors with the strings ` exists on disk` and `Error when getting Git blob hash` mostly comes from the .git folder  by looking at dd logs [here](https://ogp.datadoghq.com/logs?query=service%3Aisomer%20env%3Aproduction%20%40aws.awslogs.logStream%3A%2Aecs%2A%20source%3A%22elasticbeanstalk%2Fcms-backend-prod-node18%2Fvar%2Flog%2Fweb.stdout.log%22%20%22Error%20when%20getting%20Git%20blob%20hash%22%20%22exists%20on%20disk%22%20-%22.git%22%20&cols=host%2Cservice&fromUser=true&index=%2A&messageDisplay=inline&refresh_mode=sliding&saved-view-id=2492484&storage=hot&stream_sort=desc&viz=stream&from_ts=1711516320734&to_ts=1712812320734&live=true). 

ie all instances of `exists on disk` and `Error when getting Git blob hash` errors in [prod for the past two weeks](https://ogp.datadoghq.com/logs?query=service%3Aisomer%20env%3Aproduction%20%40aws.awslogs.logStream%3A%2Aecs%2A%20source%3A%22elasticbeanstalk%2Fcms-backend-prod-node18%2Fvar%2Flog%2Fweb.stdout.log%22%20%22Error%20when%20getting%20Git%20blob%20hash%22%20%22exists%20on%20disk%22%20&cols=host%2Cservice&fromUser=true&index=%2A&messageDisplay=inline&refresh_mode=sliding&saved-view-id=2492484&storage=hot&stream_sort=desc&viz=stream&from_ts=1711516320734&to_ts=1712812320734&live=true) came from trying to access the .git folders. 

**Breaking Changes**

<!-- Does this PR contain any backward incompatible changes? If so, what are they and should there be special considerations for release? -->

- [ ] Yes - this PR contains breaking changes
  - Details ...
- [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5)



## Before & After Screenshots

**BEFORE**:
![Screenshot 2024-04-11 at 12 51 22 PM](https://github.com/isomerpages/isomercms-backend/assets/42832651/0813b86e-7c7f-419d-a3bf-597ffcdbf3d5)

<!-- [insert screenshot here] -->

**AFTER**:
![Screenshot 2024-04-11 at 12 52 17 PM](https://github.com/isomerpages/isomercms-backend/assets/42832651/0bb56f08-e0cf-4c73-bb31-217db10f5697)

<!-- [insert screenshot here] -->

## Tests

in order to reproduce this: 
- [ ] go to any site with an image
- [ ] create or click on an existing markdown page
- [ ] in the page, add an image
- [ ] open network console
- [ ] update the link to be /images/%2f<existing>
- [ ] there should be a 4xx in the network console.
- [ ] go into datadog, there should not be any instance of `fatal: path '.git' exists on disk, but not in 'HEAD'`
  • Loading branch information
kishore03109 authored Apr 16, 2024
1 parent 4f91112 commit 9f10ec1
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 2 deletions.
16 changes: 14 additions & 2 deletions src/services/db/GitFileSystemService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ export default class GitFileSystemService {
repoName: string,
filePath: string,
isStaging: boolean
): ResultAsync<string, GitFileSystemError> {
): ResultAsync<string, GitFileSystemError | NotFoundError> {
const efsVolPath = isStaging
? EFS_VOL_PATH_STAGING
: EFS_VOL_PATH_STAGING_LITE
Expand All @@ -333,6 +333,14 @@ export default class GitFileSystemService {
)

if (error instanceof GitError) {
// NOTE: While some path can be potentially normalised by Git (eg. images//path.png),
// it is not guaranteed to exist in HEAD. We simply return a not found error in this case.
if (
error.message.includes("fatal: path") &&
error.message.includes("exists on disk, but not in 'HEAD'")
) {
return new NotFoundError("File/Directory does not exist")
}
return new GitFileSystemError("Unable to determine Git blob hash")
}

Expand Down Expand Up @@ -1121,7 +1129,11 @@ export default class GitFileSystemService {
)
)
.andThen((directoryContents) => {
const resultAsyncs = directoryContents.map((directoryItem) => {
const IGNORED_FILES = [".git"]
const filteredDirectoryContents = directoryContents.filter(
(directoryItem) => !IGNORED_FILES.includes(directoryItem.name)
)
const resultAsyncs = filteredDirectoryContents.map((directoryItem) => {
const isDirectory = directoryItem.isDirectory()
const { name } = directoryItem
const path = directoryPath === "" ? name : `${directoryPath}/${name}`
Expand Down
37 changes: 37 additions & 0 deletions src/services/db/__tests__/GitFileSystemService.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ const dirTree = {
},
}

const dirTreeWithIgnoredFiles = {
...dirTree,
".git": "fake git directory",
}

describe("GitFileSystemService", () => {
beforeEach(() => {
mockFs({
Expand Down Expand Up @@ -172,6 +177,18 @@ describe("GitFileSystemService", () => {

expect(result._unsafeUnwrapErr()).toBeInstanceOf(NotFoundError)
})

it("should ignore .git folder", async () => {
mockFs({
[EFS_VOL_PATH_STAGING]: dirTreeWithIgnoredFiles,
})
const result = await GitFileSystemService.listDirectoryContents(
"fake-repo",
"fake-empty-dir",
DEFAULT_BRANCH
)
expect(result._unsafeUnwrap()).toHaveLength(0)
})
})

describe("listPaginatedDirectoryContents", () => {
Expand Down Expand Up @@ -661,6 +678,26 @@ describe("GitFileSystemService", () => {

expect(result._unsafeUnwrapErr()).toBeInstanceOf(GitFileSystemError)
})

it("should return not found error for a non-existent file", async () => {
MockSimpleGit.cwd.mockReturnValueOnce({
revparse: jest
.fn()
.mockRejectedValueOnce(
new GitError(
undefined,
`fatal: path //fake-media-file.png exists on disk, but not in 'HEAD'`
)
),
})
const result = await GitFileSystemService.getGitBlobHash(
"fake-repo",
"fake-dir/%2ffake-media-file.png",
true
)

expect(result._unsafeUnwrapErr()).toBeInstanceOf(NotFoundError)
})
})

describe("getFilePathStats", () => {
Expand Down

0 comments on commit 9f10ec1

Please sign in to comment.