Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Brök – Find broken links in text documents (github.com/smallhadroncollider)
93 points by shcollider on April 19, 2020 | hide | past | favorite | 39 comments


Interesting, when I saw the headline, I immediately thought it was something else that I want: I keep all of my notes as plain-text Markdown files, and I use Wiki-style links between them, like `[Another File](another-file.md)`. One of the pitfalls of this approach is that if you change a filename, then all the links referencing that file will then be broken. So I'd love a tool like this that can be used to clean up a directory full of these documents.


I used to do that, but I switched to wiki-style [[links]] instead. But I can go back and forth with a regex.

There's some really interesting discussion about interlinked notes here: https://news.ycombinator.com/item?id=22767658


Love to hear any more commentary about the pros and cons of this approach? I assume the main reasoning is that it’s easier to make a link?

The nice thing about the `[File](file.md)` method is that it’s markdown native, e.g., you can follow them in GitHub, if you have a way to render file URLs HTML, you’re links will just work in the browser too, and many txt editors can follow them by default.


The only time I would use the `[File](file.md)` syntax is if I'm specifically making documentation that will be published for other people.

Also, Github uses Gollum, which actually supports wiki style links anyway.

If you are maintaining a knowledge base, it gets really tedious to do that, if 99% of your links are just going to have the same caption as your filename. So being able to make links as quickly as possible facilitates efficiency. Besides, there are alternative syntaxes for being able to specific a custom caption for a wiki-style link.

Another thing to consider, is that in most mediums that I use my knowledge bases in, they are not styling my links anyway. I can still of course follow them automatically, but they are just presenting them as [[Links]]. Most people prefer that.

My different wikis have gone through many different iterations, and most of these engines are fairly compatible-ish with each other. They are similar enough that any migration required could be done with a few lines of Perl on my set of md files.


Huh, ok I just tried this out and it doesn't work as well as I was hoping. It appears that that `[[A file]]` style works on GitHub Wikis but doesn't work in the repo browser itself (e.g., from the `README.md`)? That's a major limitation to me, because one of the big benefits of my approach is I get a free web version of every local Wiki that I have on GitHub.

I'm curious if you have another solution for actually publishing a `git` repo on the web using the `[[a file]]` link style?


Oh sorry, if you want it to work in your README.md, you have to specify the org-mode syntax. BTW it supporst org-mode syntax. Just have a README.org.


Nice, lots of great tips here, you've made a strong enough case that I’m going to try out this approach myself. Thanks for sharing!


Given that your file names are pretty unique words, does it really need to be harder than a simple search-and-replace?

sed -i 's/(another-file\.md)/(yet-another-file.md)/g' *.md


It could be automatic. I also do everything as .md and have been contemplating this for a while. Maybe I should get it done finaly.


If you're not into Haskell, you might want to give my Python project a try:

https://github.com/jwilk/urlycue


Why would anyone care if `brok` is implemented in Haskell?


The name is a pun on "broke"? (I first thought it was Swedish for "brøk" (fraction/ratio), but it turns out Swedes don't say "brök" but "bråk", which just means "noise" in Norwegian&Danish. But that's way too convoluted a pun, even for Haskell).


I think it might be just a heavy metal umlaut [0], meant to look cool.

[0] https://en.wikipedia.org/wiki/Metal_umlaut


Bråk means fight in Swedish in the noun sense!


Or "division" in a mathematical sense


Why would anyone downvote my comment? It’s clearly relevant, entirely accurate, and charmingly polite.


I have no idea and I didn't so =\


Yes, bad pun on "broke"


Amazing! I made a shitty version[0] of this last year. I guess great minds think alike, but greater minds execute better!

[0] https://github.com/alexpapworth/detect-broken-links


Really cool! This could be implemented in wikipedia to detect and fix broken links easier.


This is cool although for practical purposes, I think most people who need such a tool will not be using Haskell. It is tempting me to add Haskell to my CI just because it looks so nice and easy to use.


> This is cool although for practical purposes, I think most people who need such a tool will not be using Haskell

Many people writing documents as text files use Pandoc, which is one of the most popular tools written in Haskell.


I didn't realize Pandoc was written in Haskell :)


Pardon my ignorance about compilers, Haskell, and programming in general (I have no background in CS, please don't roast me :), but cannot the Glasgow Haskell Compiler (GHC) produce a (binary executable) package that could run in Linux/Windows without having Haskell?


Yes, and the announcement even says:

"Binaries for Mac and Linux are available."


Except the latest release has only a binary for macOS.


Sorry about that, have built a Linux version now. (There wasn't a Docker container available for the build file I'd used at the time)


No problem at all. Thanks!


still doesn't mean you need Haskell in your CI environment.


Correct. I didn't say I needed it. I just suggested that I might do this.


Why would you have to "add Haskell to your CI" to run the tool?

You also don't have to install C++ and Rust compilers to be able to run Firefox, or a Haskell compiler to run Pandoc.


Very true although the latest binary is only released for macOS.


It seems all you miss by getting the previous version is the --no-color flag.


interesting. is haskell that big? don't they have a docker image you can pull in?

i know python and node and other languages have small images you can get for the CI situation you describe.


GHC is notoriously large. The compiler itself is around 1.5GB (this includes documentation, static and dynamic libraries).

To get an idea of the size of the bare-minimum system that's required to develop in language x, you can look at the cumulative sizes of the Nix package closures. This is what I get for ghc, nodejs and python3:

    λ nix path-info -rS nixpkgs.ghc | awk '{s+=$2}END{print s/1073741824 " GB"}'
    3.35463 GB

    λ nix path-info -rS nixpkgs.nodejs | awk '{s+=$2}END{print s/1073741824 " GB"}'
    1.05813 GB

    λ nix path-info -rS nixpkgs.python3 | awk '{s+=$2}END{print s/1073741824 " GB"}'
    0.485017 GB


I started installing haskell a lil while ago with macports but I got as far as

bash$ port rdeps ghc

The following ports are dependencies of ghc @8.8.3_0:

gnupg2 pkgconfig libiconv gperf gettext ncurses zlib xz bzip2 libassuan libgpg-error pth libksba libgcrypt readline gnutls autoconf automake libtool xattr unzip gtk-doc glib2 libxml2 icu libffi pcre libedit libxslt perl5.28 db48 gdbm docbook-xml xmlcatmgr docbook-xml-4.1.2 docbook-xml-4.2 docbook-xml-4.3 docbook-xml-4.4 docbook-xml-4.5 docbook-xml-5.0 docbook-xsl-nons itstool gawk py27-libxml2 python27 expat openssl sqlite3 python_select python2_select python38 python3_select py38-anytree py38-setuptools py38-six py38-pytest py38-setuptools_scm py38-py py38-packaging py38-attrs py38-hypothesis py38-sortedcontainers py38-zopeinterface py38-parsing py38-more-itertools py38-pluggy py38-wcwidth pytest_select py38-nose nosetests_select py38-lxml py38-pygments pygments_select py38-mock clang-9.0 cmake libcxx curl libidn2 libunistring perl5 texinfo help2man p5.28-locale-gettext libpsl curl-ca-bundle libarchive lzo2 lz4 zstd libuv libomp llvm-9.0 xar llvm_select clang_select ld64 ld64-latest libmacho-headers libtapi gmp libtasn1 p11-kit nettle libusb-compat libusb npth pinentry-mac openldap tcp_wrappers cyrus-sasl2 kerberos5 libcomerr coreutils alex stack happy HsColour python37 py37-sphinx py37-docutils py37-roman py37-setuptools py37-alabaster py37-babel py37-tz py37-pytest py37-setuptools_scm py37-py py37-packaging py37-attrs py37-hypothesis py37-sortedcontainers py37-six py37-zopeinterface py37-parsing py37-more-itertools py37-pluggy py37-importlib-metadata py37-zipp py37-toml py37-wcwidth py37-freezegun py37-dateutil py37-pytest-cov py37-coverage py37-mock py37-imagesize py37-jinja2 py37-markupsafe py37-pygments py37-requests py37-chardet py37-idna py37-urllib3 py37-certifi py37-snowballstemmer sphinx_select py37-sphinxcontrib-applehelp py37-sphinxcontrib-devhelp py37-sphinxcontrib-htmlhelp py37-sphinxcontrib-jsmath py37-sphinxcontrib-serializinghtml py37-sphinxcontrib-qthelp texlive texlive-basic texlive-common texlive-bin fontconfig freetype libpng ossp-uuid libzzip cairo libpixman xrender xorg-libX11 xorg-xtrans xorg-xorgproto xorg-util-macros xorg-libXdmcp xorg-libXau xorg-libxcb xorg-xcb-proto xorg-libpthread-stubs xorg-libXext xorg-xcb-util graphite2 fonttools py37-unicodedata2 py37-brotli harfbuzz harfbuzz-icu libpaper mpfr potrace xorg-libXp xpm xorg-libXt xorg-libsm xorg-libice xorg-libXaw groff ghostscript jbig2dec jpeg libidn tiff lcms2 psutils netpbm jasper jbigkit libnetpbm xorg-libXmu xorg-libXi xorg-libXfixes texlive-bin-extra latexmk texlive-latex detex latexdiff p5.28-algorithm-diff pdfjam texlive-latex-recommended pgf dvipng gd2 webp giflib t1lib dvisvgm asciidoc fop brotli woff2 texlive-context texlive-metapost texlive-xetex texlive-plain-generic texlive-fonts-recommended texlive-math-science texlive-fontutils lcdf-typetools ps2eps t1utils texlive-lang-czechslovak texlive-lang-english texlive-lang-european texlive-lang-french texlive-lang-german texlive-lang-italian texlive-lang-polish texlive-lang-portuguese texlive-lang-spanish texlive-luatex texlive-fonts-extra texlive-latex-extra texlive-pictures

And decided to give it a miss.


That seems to be a packaging issue with MacPorts. It looks like it's including build dependencies, too. It's pulling TeX Live and the Sphinx Python bindings for formatting documentation. I'm not sure what Perl is for, or why it needs 3 different versions of Python.

The Debian package doesn't require any of those. The Homebrew package has no dependencies at all (and only "python" and "sphinx-doc" as build dependencies).


On Arch Pandoc “only” requires a few dozen or so Haskell packages. Installing it does kind of mess up pacman’s output when updating etc, but that’s a very minor annoyance for a pretty useful tool to me.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: