Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support static pages #37

Closed
SoniEx2 opened this issue Dec 31, 2020 · 20 comments
Closed

Support static pages #37

SoniEx2 opened this issue Dec 31, 2020 · 20 comments

Comments

@SoniEx2
Copy link

SoniEx2 commented Dec 31, 2020

It'd be nice if static pages were supported because we want to rewrite our thing in Rust and for security reasons we recommend not running complex scripts and programs on the HTTP server, which means we recommend configuring all git repos with the so-called "dumb" protocol.

Our current thing currently just uses git directly and clones/fetches everything into a single cache repo, which means we can't parallelize it, so we're looking into alternatives for that, whether by using something like gitoxide or having multiple temporary caches and joining them together after fetching. However, having static pages support is a must.

@Byron Byron added the question label Jan 1, 2021
@Byron
Copy link
Member

Byron commented Jan 1, 2021

Thanks for letting me know! The current stance of gitoxide is to not implement the 'dumb' protocol on the client side. This is due to it being seemingly underspecified.

From what's stated here I gather that you are interested in the server side. There I also don't know how it would work at all to generate these caches and what the problems are specifically that you are running into.

I think further comments here could try to fill in the gaps to understand the requirements and allow an implementation to happen.

Thank you.

@SoniEx2
Copy link
Author

SoniEx2 commented Jan 1, 2021

Nah, we only need it on the "client" (the ability to clone).

Take a look at https://github.com/ganarchy/GAnarchy

On the "server", or more specifically on the git host, plain git works fine - git is good enough at generating static pages when pushing over SSH, so there's no point trying to replace it there.

@Byron
Copy link
Member

Byron commented Jan 1, 2021

I took a cursory look at GAnarchy but it didn't jump at me how gitoxide could help there.

Since the latter isn't able yet to do an entire clone itself and lots of features are missing, I recommend checking back when there is more of the feature set that I think you would need.

In the meantime, I am closing this issue as it lacks the detail necessary to aid any implementation.

@Byron Byron closed this as completed Jan 1, 2021
@SoniEx2
Copy link
Author

SoniEx2 commented Jan 1, 2021

These would be replaced with a Gitoxide-based equivalent: https://github.com/ganarchy/GAnarchy/blob/ganarchy/ganarchy/git.py

Many of the repos here do not support the v1/v2 protocol: https://github.com/ganarchy/ganarchy.github.io/blob/master/index.toml

@SoniEx2
Copy link
Author

SoniEx2 commented Jul 30, 2022

The "dumb" protocol is simply ".git dir over HTTP"

Since HTTP doesn't provide an index, git has to provide that index, but otherwise you just have a .git like you would on a filesystem. This comes with all the stuff a normal .git comes with including packfiles and the like.

We'd like to ask for this again, as our plan is to have self-embedding git repos for our HTTP-based projects, and it'd be nice if gitoxide could handle them so we don't need to require system-wide git.

@Byron
Copy link
Member

Byron commented Jul 31, 2022

Welcome back!

I took a cursory look at git.py and thought that a lot of that should be doable in gitoxide already. When it comes to maintaining an index to support the 'dumb' protocol, I would think that something like this exists or can be written by the ganarchy project as a git hook maybe.

By the end of this year gitoxide will be production ready for cloning/fetching repositories and checking out a worktree correctly and completely, if that helps.

@SoniEx2
Copy link
Author

SoniEx2 commented Jul 31, 2022

Can gitoxide fetch all of the repos listed here? https://ganarchy.autistic.space/index.toml

This is the main blocker for us being able to use gitoxide.

(Also, that new git.py (it is updated from when it was originally linked) was written specifically to parallelize a large amount of git fetch. And it does an amazing job at that.)

@Byron
Copy link
Member

Byron commented Jul 31, 2022

Here is a listing of the supported transport protocols - dump http is not among them so I presume gitoxide cannot fetch these repos. It looks like git2 also doesn't support this protocol, so the chances of ever supporting it on the client side in gitoxide aren't high unless the implementation is contributed.

GitHub doesn't support line-links in markdowns unfortunately, so here is what the link above should have displayed.

Screen Shot 2022-07-31 at 09 27 48

@SoniEx2
Copy link
Author

SoniEx2 commented Jul 31, 2022

how hard would it be to use the on-disk format code ("the on-disk format must remain compatible, and we will never contend with it.") in the HTTP transport?

(it'd be nice to reuse it so it can be kept in sync... vs maintaining a separate copy)

@Byron
Copy link
Member

Byron commented Aug 1, 2022

how hard would it be to use the on-disk format code ("the on-disk format must remain compatible, and we will never contend with it.") in the HTTP transport?

I don't know how these two related and would assume one is entirely independent of another. So one would probably have to find out.

@SoniEx2
Copy link
Author

SoniEx2 commented Aug 1, 2022

the static pages protocol is just the on-disk format over HTTP. so they're very related.

decoupling them allows better transfer optimizations but then you kinda have to maintain the same thing twice.

@Byron
Copy link
Member

Byron commented Aug 1, 2022

Your work will be valuable as it will pave the way for eventually merging such capability back into mainline or be the basis for a backport. It's not on my agenda at all to add static http support, but I am happy to assist with questions should they arise on your own journey on getting it done.

@SoniEx2
Copy link
Author

SoniEx2 commented Aug 1, 2022

hmm, thoughts on optimizing gitoxide for network-attached storage and the like? (gitoxide over google drive maybe?) such work would likely benefit attempts to add static pages protocol support (at least, using shared code).

basically, can you make the on-disk format driver backend-agnostic and optimized for latency?

@Byron
Copy link
Member

Byron commented Aug 2, 2022

hmm, thoughts on optimizing gitoxide for network-attached storage and the like?

No. But I think a first step would be to get anything to work in this regard.

@SoniEx2
Copy link
Author

SoniEx2 commented Aug 20, 2022

actually hmm does gitoxide have support for VFS and local filesystem clone?

@Byron
Copy link
Member

Byron commented Aug 21, 2022

gitoxide can perform a git-aware clone from a location specified by path. Since I don't know what VFS means exactly in this context, I'd say 'no' to this one.

@SoniEx2
Copy link
Author

SoniEx2 commented Aug 21, 2022

hmm... does that mean this is unsupported?

       -l, --local
           When the repository to clone from is on a local machine, this flag
           bypasses the normal "Git aware" transport mechanism and clones the
           repository by making a copy of HEAD and everything under objects
           and refs directories. The files under .git/objects/ directory are
           hardlinked to save space when possible.

           If the repository is specified as a local path (e.g.,
           /path/to/repo), this is the default, and --local is essentially a
           no-op. If the repository is specified as a URL, then this flag is
           ignored (and we never use the local optimizations). Specifying
           --no-local will override the default when /path/to/repo is given,
           using the regular Git transport instead.

           NOTE: this operation can race with concurrent modification to the
           source repository, similar to running cp -r src dst while modifying
           src.

@Byron
Copy link
Member

Byron commented Aug 21, 2022

That is unsupported indeed - gitoxide currently uses 'git-aware' mechanisms only. Local clone optimizations can happen at some point though, it's just not a priority yet.

@SoniEx2
Copy link
Author

SoniEx2 commented Aug 21, 2022

ahh. hmm...

"VFS" just means virtual filesystem. e.g. sshfs. however we now realize that wouldn't work for implementing this. we think we just have to do it properly... does gitoxide support http/2 (especially multiplexing)? that would help us a lot. where would a proper implementation of this go?

@Byron
Copy link
Member

Byron commented Aug 21, 2022

http is supported in a blocking fashion via curl and the implementation of that is here: https://github.com/Byron/gitoxide/blob/9509ce4faeca8b4e1527bac625370403495bb03c/git-transport/src/client/blocking_io/http/mod.rs .

Since you are interested in static http support, I recommend adding support for it into the curl-based http implementation. I think right now it specifically checks for the 'smart' transport, and with the right test or two it should be possible to support the 'dumb' one as well without breakage.

Byron added a commit that referenced this issue Oct 15, 2023
The new URL should trigger an overflow check but it only
happens when `url::Url::parse()` is called directly as our
code doesn't let it through anymore.

Here is the log from the fuzzer run as reported:
```
	[Environment] ASAN_OPTIONS=handle_abort=2
+----------------------------------------Release Build Stacktrace----------------------------------------+
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse -rss_limit_mb=2560 -timeout=60 -runs=100 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
Time ran: 0.12717413902282715
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1001182178
INFO: Loaded 1 modules   (90683 inline 8-bit counters): 90683 [0x5d0c0597cce0, 0x5d0c05992f1b),
INFO: Loaded 1 PC tables (90683 PCs): 90683 [0x5d0c05992f20,0x5d0c05af52d0),
/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse: Running 1 inputs 100 time(s) each.
Running: /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
thread '<unnamed>' panicked at /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:272:17:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1200==ERROR: AddressSanitizer: ABRT on unknown address 0x0539000004b0 (pc 0x7c51742fb00b bp 0x7ffcd1eadd80 sp 0x7ffcd1eadaf0 T0)
    #0 0x7c51742fb00b in raise /build/glibc-SzIz7B/glibc-2.31/sysdeps/unix/sysv/linux/raise.c:51:1
    #1 0x7c51742da858 in abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79:7
    #2 0x5d0c0572a1e6 in std::sys::unix::abort_internal::he854d2f74b119e66 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys/unix/mod.rs:375:14
    #3 0x5d0c0518cda6 in std::process::abort::h68c27a968dc7c74f /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/process.rs:2271:5
    #4 0x5d0c0564d3d4 in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h1e76e422e0c48db0 /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:57:9
    #5 0x5d0c0571dcf7 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..Fn$LT$Args$GT$$GT$::call::h0c028c5af3475e03 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/alloc/src/boxed.rs:2021:9
    #6 0x5d0c0571dcf7 in std::panicking::rust_panic_with_hook::hd26c5407fbf20d71 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:735:13
    #7 0x5d0c0571da05 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h944e23ea90982f5a /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:601:13
    #8 0x5d0c0571aee5 in std::sys_common::backtrace::__rust_end_short_backtrace::h8a3632d339dd3313 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys_common/backtrace.rs:170:18
    #9 0x5d0c0571d781 in rust_begin_unwind /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:597:5
    #10 0x5d0c05190634 in core::panicking::panic_fmt::h85c36fc727234039 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:72:14
    #11 0x5d0c051906d2 in core::panicking::panic::h6a47ed7881a36f4d /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:127:5
    #12 0x5d0c053e09c6 in idna::punycode::encode_into::hd674630fb161bf5b /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:0
    #13 0x5d0c053eacbc in idna::uts46::Idna::to_ascii_inner::h69c52eb69ae48276 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:469:34
    #14 0x5d0c053eb793 in idna::uts46::Idna::to_ascii::h76237795045112f3 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:481:26
    #15 0x5d0c053eda7a in idna::uts46::Config::to_ascii::h423c722ab2fa9813 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:572:9
    #16 0x5d0c053f0070 in idna::domain_to_ascii::h93e94e995d03e9ef /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/lib.rs:64:5
    #17 0x5d0c0530374a in url::host::Host::domain_to_ascii::h6cb1ae8fe42a1e42 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:166:9
    #18 0x5d0c0530374a in url::host::Host::parse::h962d3990e0ff5091 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:86:22
    #19 0x5d0c0532e87d in url::parser::Parser::parse_host::h89faea9182ce2512 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:1024:20
    #20 0x5d0c0532ba8d in url::parser::Parser::parse_host_and_port::heb44bd7ebd2593f6 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:944:33
    #21 0x5d0c0532896f in url::parser::Parser::after_double_slash::hbb313f562f0978a2 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:843:13
    #22 0x5d0c0531a129 in url::parser::Parser::parse_with_scheme::h54a417e4650ea024 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:453:17
    #23 0x5d0c05317824 in url::parser::Parser::parse_url::hfa6b21c53cd0ac1c /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:366:20
    #24 0x5d0c05350ba0 in url::ParseOptions::parse::hb8b3309b3b920457 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:257:9
    #25 0x5d0c052b1ffc in url::Url::parse::h82a965c69df59bba /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:292:9
    #26 0x5d0c052b1ffc in gix_url::parse::input_to_utf8_and_url::h14b70a32a8884316 gitoxide/gix-url/src/parse.rs:252:5
    #27 0x5d0c052a9b9d in gix_url::parse::url::h6f7f7b0bddf4b8d7 gitoxide/gix-url/src/parse.rs:99:24
    #28 0x5d0c052b34cc in gix_url::parse::hfaab74909f01c9cc gitoxide/gix-url/src/lib.rs:38:46
    #29 0x5d0c05270b4a in rust_fuzzer_test_input gitoxide/gix-url/fuzz/fuzz_targets/parse.rs:5:14
    #30 0x5d0c0564d537 in __rust_try libfuzzer_sys.f28e88650cadb2d4-cgu.0:0
    #31 0x5d0c0564c79f in std::panicking::try::h90783eeef7e35925 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:468:19
    #32 0x5d0c0564c79f in std::panic::catch_unwind::h041b281a0e92d580 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panic.rs:142:14
    #33 0x5d0c0564c79f in LLVMFuzzerTestOneInput /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:28:22
    #34 0x5d0c0566bb83 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #35 0x5d0c056572e2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #36 0x5d0c0565cb8c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #37 0x5d0c056860c2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #38 0x7c51742dc082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16
    #39 0x5d0c05191c4d in _start
```
`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants