Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML::Node#to_s truncates large files #4710

Closed
pedantic-git opened this issue Jul 14, 2017 · 23 comments
Closed

XML::Node#to_s truncates large files #4710

pedantic-git opened this issue Jul 14, 2017 · 23 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:needs-more-info topic:stdlib:serialization

Comments

@pedantic-git
Copy link

Hi Crystal devs! Thanks for fixing my earlier XML issue so quickly. I'm afraid I have another one.

It seems like the XML::Node#to_s truncates very large files, in my case at around 1.2MB.

Try this:

require "xml"
f = File.open "/path/to/EnragedBull.svg"
xml = XML.parse(f)
puts xml.to_s
# or xml.to_s(STDOUT)

(you can get the EnragedBull.svg here: https://openclipart.org/download/282790/EnragedBull.svg )

The file, which is 2.2MB to begin with, is cut short around 1.2MB.

I had a cursory look at the code but since it's making calls into LibXML I'm afraid I'm at a loss to fix it myself.

@bmmcginty
Copy link
Contributor

Not a dev,but figured I'll try to help. What crystal version and hardware are you on? I'm not reproducing this on 0.23.0 (or my oldest available version, 0.20.5).

@pedantic-git
Copy link
Author

Hi @bmmcginty - thanks!

crystal --version
Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1

ldd ./test
	linux-vdso.so.1 =>  (0x00007ffe0cec3000)
	libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007fb76adb3000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb76ab95000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb76a98d000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb76a789000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb76a572000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb76a1a9000)
	/lib64/ld-linux-x86-64.so.2 (0x00005589afda1000)
	libicuuc.so.57 => /usr/lib/x86_64-linux-gnu/libicuuc.so.57 (0x00007fb769e01000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fb769be5000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fb7699bf000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb7696b6000)
	libicudata.so.57 => /usr/lib/x86_64-linux-gnu/libicudata.so.57 (0x00007fb767c39000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb7678af000)

dpkg -l libxml2
||/ Name                         Version             Architecture        Description
+++-============================-===================-===================-==============================================================
ii  libxml2:amd64                2.9.4+dfsg1-2.2     amd64               GNOME XML library
ii  libxml2:i386                 2.9.4+dfsg1-2.2     i386                GNOME XML library

I also noticed that the compiler bombs when I try to build the above test script with --release:

crystal build --release test.cr
crystal: /var/cache/omnibus/src/llvm/llvm-3.8.1.src/lib/CodeGen/LexicalScopes.cpp:160: llvm::LexicalScope* llvm::LexicalScopes::getOrCreateRegularScope(const llvm::DILocalScope*): Assertion `cast<DISubprogram>(Scope)->describes(MF->getFunction())' failed.
/usr/bin/crystal: line 102: 15114 Aborted                 (core dumped) "$INSTALL_DIR/embedded/bin/crystal" "$@"

Let me know if there's anything else I can help with!

@bmmcginty
Copy link
Contributor

Just tried with that exact revision of crystal, so I suspect it's your LLVM version. What distro are you running? Maybe possible to upgrade llvm? If not, and you can give me the info on your distro, I can try and spin up a cloud VM and see what I can do to assist...though I'm not sure exactly what yet.

@pedantic-git
Copy link
Author

Thanks! I'm away from my computer right now but it's Ubuntu 17.04 (x86_64), fully patched every day.

@pedantic-git
Copy link
Author

@bmmcginty Just looking at it now - looks like my Crystal binary (from the official Crystal Ubuntu repo) is statically linked against LLVM 3.8.1, but my distro does have 4.0 installed.

@asterite
Copy link
Member

asterite commented Sep 9, 2017

@pedantic-git I can't reproduce this. Is there any chance you can show us the output of that puts xml.to_s?

@pedantic-git
Copy link
Author

@asterite Huh - funny that you can't reproduce it with the EnragedBull.svg file. Since filing this I've got a new workstation and I've changed my distro from Ubuntu to Arch but the issue still manifests in the same way!

Here are some links:

@asterite
Copy link
Member

I'm trying this on OSX, so it might be an issue only in linux. I'll try with docker.

@pedantic-git
Copy link
Author

Thanks! I suspect it's somewhere in the interface with the underlying LibXML so it wouldn't surprise me if it was OS-specific.

@asterite
Copy link
Member

@pedantic-git I just tried it in docker and it worked fine. What OS are you using?

@pedantic-git
Copy link
Author

@asterite I'm using Arch Linux but the same thing happened on Ubuntu.

Try this Dockerfile:

FROM base/devel
RUN pacman -Sy --noconfirm crystal libxml2
WORKDIR /tmp
ADD EnragedBull.svg test.cr /tmp/
RUN bash -c "crystal test.cr > output.svg"
CMD ls -lh

For me when that's run it outputs:

total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 12:50 output.svg
-rw-r--r-- 1 root root   80 Sep 11 12:46 test.cr

@asterite
Copy link
Member

$ docker build -t crystaltest:xml .
Sending build context to Docker daemon  4.391MB
Step 1/6 : FROM base/devel
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 04c52d86dd93
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> abeb5a7b35ed
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> b13c1ef3ed3f
Removing intermediate container 607e0e14a975
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in cb9d1585927b
 ---> f9cb0f26c1c7
Removing intermediate container cb9d1585927b
Step 6/6 : CMD ls -lh
 ---> Running in a493a8f343fe
 ---> 3e4af00a31a6
Removing intermediate container a493a8f343fe
Successfully built 3e4af00a31a6
Successfully tagged crystaltest:xml

$ docker run crystaltest:xml
total 4.2M
-rw-r--r-- 1 root root 2.1M Sep 11 13:07 EnragedBull.svg
-rw-r--r-- 1 root root 2.1M Sep 11 13:16 output.svg
-rw-r--r-- 1 root root   79 Sep 11 13:16 test.cr

No idea why you are getting different results...

Could be #2713 . What if you write that string to a file, from inside Crystal? Using > is known to not work very well in Crystal.

@pedantic-git
Copy link
Author

Same problem for me (originally I experienced this in a Kemal app).

test.cr:

require "xml"
f = File.open "EnragedBull.svg"
xml = XML.parse(f)
File.write "output.svg", xml.to_s

Dockerfile:

FROM base/devel
RUN pacman -Sy --noconfirm crystal libxml2
WORKDIR /tmp
ADD EnragedBull.svg test.cr /tmp/
RUN crystal test.cr
CMD ls -lh

In a shell:

 ~/Desktop  docker build .
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
 ---> 2e0e74301392
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 08fb581d732a
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> fdde9a4562cf
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> 60dc1009df8c
Step 5/6 : RUN crystal test.cr
 ---> Running in 6da5c1e3d7c2
 ---> c7eb0ab0817e
Removing intermediate container 6da5c1e3d7c2
Step 6/6 : CMD ls -lh
 ---> Running in 71ee6aebcf8f
 ---> 35b8ddb17fe5
Removing intermediate container 71ee6aebcf8f
Successfully built 35b8ddb17fe5
 ~/Desktop  docker run 35b8ddb17fe5
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:22 output.svg
-rw-r--r-- 1 root root  100 Sep 11 13:21 test.cr

Could it be a difference between running Docker on a Linux kernel vs a Darwin kernel? Seems pretty unlikely! I would be inclined to blame my hardware but this is a new workstation since the bug was originally filed (they were both XPS13s with i7 processors, but 2 years apart).

@RX14
Copy link
Contributor

RX14 commented Sep 11, 2017

I can reproduce:

$ docker build .
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
latest: Pulling from base/devel
3a32adc5d06e: Pull complete
3c005aad0569: Pull complete
fcd7db7c97c1: Pull complete
cc43857431eb: Pull complete
44d26cc3e206: Pull complete
Digest: sha256:07d592e4b3409b6436230a6db84aa6bc8f8550acf95ccd48e0a5023ba3d19523
Status: Downloaded newer image for base/devel:latest
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Running in 9bc277a32d55
:: Synchronizing package databases...
downloading core.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading community.db...
resolving dependencies...
looking for conflicting packages...

Packages (5) libedit-20170329_3.1-1  libevent-2.1.8-1  llvm-libs-4.0.1-5  crystal-0.23.1-1  libxml2-2.9.5+6+g07e227ed-1

Total Download Size:    17.45 MiB
Total Installed Size:  125.58 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
downloading libevent-2.1.8-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading llvm-libs-4.0.1-5-x86_64.pkg.tar.xz...
downloading libxml2-2.9.5+6+g07e227ed-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
checking available disk space...
:: Processing package changes...
installing libevent...
Optional dependencies for libevent
    python2: to use event_rpcgen.py
installing libedit...
installing llvm-libs...
installing crystal...
Optional dependencies for crystal
    shards: crystal language package manager
    libyaml: For YAML support
    gmp: For BigInt support [installed]
    libxml2: For XML support [pending]
installing libxml2...
:: Running post-transaction hooks...
(1/1) Arming ConditionNeedsUpdate...
 ---> 768a519f64a1
Removing intermediate container 9bc277a32d55
Step 3/6 : WORKDIR /tmp
 ---> 6b7c8d19733d
Removing intermediate container ca06e0f29d4c
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> 26333f691206
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in 582a805c848f
 ---> 3f750795bdf3
Removing intermediate container 582a805c848f
Step 6/6 : RUN ls -lh
 ---> Running in f957b53a1589
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 10:36 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:25 output.svg
-rw-r--r-- 1 root root   80 Sep 11 10:36 test.cr
 ---> 8a37f1ec1e01
Removing intermediate container f957b53a1589
Successfully built 8a37f1ec1e01

@RX14
Copy link
Contributor

RX14 commented Sep 11, 2017

Even with writing in crystal!

require "xml"
f = File.open "EnragedBull.svg"
xml = XML.parse(f)
File.write("test.xml", xml.to_s)
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 768a519f64a1
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> 6b7c8d19733d
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> e0b2e5ed31b9
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in ee6c73c5fb2d
 ---> f8e0dfdefd0e
Removing intermediate container ee6c73c5fb2d
Step 6/6 : RUN ls -lh
 ---> Running in 15fb33838858
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 10:36 EnragedBull.svg
-rw-r--r-- 1 root root    0 Sep 11 13:27 output.svg
-rw-r--r-- 1 root root   98 Sep 11 13:27 test.cr
-rw-r--r-- 1 root root 1.2M Sep 11 13:27 test.xml
 ---> 0e83cd04ae41
Removing intermediate container 15fb33838858
Successfully built 0e83cd04ae41

@RX14
Copy link
Contributor

RX14 commented Sep 11, 2017

Considering me and @asterite have exactly the same base/devel hash, and docker on mac runs in a real linux kernel, this seems incredibly strange.

Just as a sanity check, here's the sha1 of my EnragedBull.svg: 6219109e159c3b6df38a86f640270a3c9277c896

@asterite
Copy link
Member

@RX14 But outside docker it works fine?

@pedantic-git
Copy link
Author

I know Docker on Mac no longer needs VirtualBox to run so perhaps it uses some more advanced Mac virtualization rather than a Linux kernel these days.

I'm just trying it out on the official Docker VM using docker-machine - will report back shortly.

@pedantic-git
Copy link
Author

Yep - same problem on the official Docker VM:

 ~/Desktop  docker-machine create --driver=virtualbox crystal-test
[...snip creation stuff...]
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env crystal-test
 ~/Desktop  eval $(docker-machine env crystal-test)
 ~/Desktop  docker build .
Sending build context to Docker daemon  2.197MB
[...snip building stuff...]
Step 5/6 : RUN crystal test.cr
 ---> Running in 0994fa80a6c1
 ---> 0cfd5dbc0c48
Removing intermediate container 0994fa80a6c1
Step 6/6 : CMD ls -lh
 ---> Running in 3b16115b199b
 ---> 92b67fe284f8
Removing intermediate container 3b16115b199b
Successfully built 92b67fe284f8
 ~/Desktop  docker run 92b67fe284f8
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:34 output.svg
-rw-r--r-- 1 root root  100 Sep 11 13:21 test.cr

@RX14
Copy link
Contributor

RX14 commented Sep 11, 2017

@asterite no, it's reproducible outside the container.

@asterite
Copy link
Member

Then you can try to debug it, if you want and have time. The code is here: https://github.com/crystal-lang/crystal/blob/master/src/xml/node.cr#L424-L453

Maybe inspecting the values of these will be helpful: https://github.com/crystal-lang/crystal/blob/master/src/xml/node.cr#L434

@rdp
Copy link
Contributor

rdp commented Oct 13, 2022

Unable to repro with

Crystal 1.2.1 [4e6c0f26e] (2021-10-21)
LLVM: 10.0.0
Default target: x86_64-unknown-linux-gnu

Maybe it was some now-fixed flushing issue or something...

@Blacksmoke16 Blacksmoke16 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:serialization labels Oct 13, 2022
@straight-shoota
Copy link
Member

I suppose we can close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:needs-more-info topic:stdlib:serialization
Projects
None yet
Development

No branches or pull requests

7 participants