> Zig still can't proper handle UTF-8 strings [1] in 2022
There's plenty of discussion on the subject in basically every HN thread about Zig: the stdlib has some utf8 and wtf validation code, ziglyph implements the full unicode spec.
You might not like how it's done, but its factually incorrect to state that Zig can't handle unicode.
> In a `recent` interview[2], he claims that Zig is faster than C and Rust, but he refers to extremely short benchmarking that has almost no value in the real world.
From my reddit reply to this same topic:
This podcast interview might not be the best showcase of the practical implications of Zig's take on safety and performance. If you want something with more meat, I highly recommend Andrew's recent talk from Handmade Seattle, where he shows the work being done on the Zig self-hosted compiler.
Lots of bit fiddling that can't be fully proven safe statically, but then you get a compiler capable of compiling Zig code stupidly fast, and that's even without factoring in incremental compilation with in-place binary patching, with which we're aiming for sub-millisecond rebuilds of arbitrarily large projects.
> The ecosystem for zig is insignificant now and a stable release would help the language.
I hope you don't mind if we don't take this advice, given the overall tone of your post.
Why does something as basic as uppercasing a string or decoding latin1 require a third-party library? I would expect that to be part of stdlib in any language. Also, why does that third-party library come with its own string implementation? What if my dependency X uses zigstr but dependency Y prefers zig-string <https://github.com/JakubSzark/zig-string>? Basically all languages designed in the past 30 years have at least basic and correct-for-BMP Unicode support built-in/as part of stdlib. Why doesn’t Zig?
That's not "simple". Rust also does neither of those two tasks with just the stdlib!
- latin1 is dead and should be in no stdlib in 2022
- uppercasing requires the current Unicode tables, so, a largish moving target that you probably don't want to embed in small programs.
Latin-1 is actually the first 256 code points from Unicode. So, you can do that in Rust by casting u8 (the Latin-1 bytes) into char (Unicode scalar values). That's unintuitive perhaps because of course in C that wouldn't do anything useful since the char type isn't Unicode, but in Rust that's exactly what you wanted.
In this environment you might very well not need actual uppercase/ lowercase but only the ASCII subset. Accordingly Rust provides that too, which is far less to carry around than the Unicode case rules. Since the ASCII case change can always be performed in situ (if you can modify the data) Rust provides that too if it's what you want.
Those are all valid points. At the moment I believe Zig has decided to leave full unicode support out of std because they don't want language releases dependent on unicode updates.
The "rules" of unicode change over time with updates to the unicode standard(s). One big one is the grapheme breaking algorithm, which has been updated over time to support things like the family emoji and other compositions.
correct-for-BMP-but-not-otherwise is simply a bug (and cultural chauvinism). And almost all of such implementations aren't correct-for-BMP because uppercasing Unicode is far from "basic".
you get a compiler capable of compiling Zig code stupidly fast, and that's even without factoring in incremental compilation with in-place binary patching, with which we're aiming for sub-millisecond rebuilds of arbitrarily large projects
That sounds great! But at the same time people in other threads here are talking about 1-3 second compilation times for Advent of Code solutions (which I presume are smallish). Can you summarise where that really fast compiler comes from, to save me searching through that talk video? Is this something that everyday users will be able to use in typical workflows?
Long story short, we're currently working on a self-hosted implementation of the compiler and what people are using now is the old C++ implementation. As soon as the new compiler is feature-complete enough, we'll start shipping it and we expect much better compilation speeds, which will be even greater speed for debug builds once the native (i.e., non-llvm) backends catch up as well.
Have you, like, seen the release notes for 0.9.0?
https://ziglang.org/download/0.9.0/release-notes.html
> Zig still can't proper handle UTF-8 strings [1] in 2022
There's plenty of discussion on the subject in basically every HN thread about Zig: the stdlib has some utf8 and wtf validation code, ziglyph implements the full unicode spec.
https://github.com/jecolon/ziglyph
You might not like how it's done, but its factually incorrect to state that Zig can't handle unicode.
> In a `recent` interview[2], he claims that Zig is faster than C and Rust, but he refers to extremely short benchmarking that has almost no value in the real world.
From my reddit reply to this same topic:
This podcast interview might not be the best showcase of the practical implications of Zig's take on safety and performance. If you want something with more meat, I highly recommend Andrew's recent talk from Handmade Seattle, where he shows the work being done on the Zig self-hosted compiler.
https://media.handmade-seattle.com/practical-data-oriented-d...
Lots of bit fiddling that can't be fully proven safe statically, but then you get a compiler capable of compiling Zig code stupidly fast, and that's even without factoring in incremental compilation with in-place binary patching, with which we're aiming for sub-millisecond rebuilds of arbitrarily large projects.
> The ecosystem for zig is insignificant now and a stable release would help the language.
I hope you don't mind if we don't take this advice, given the overall tone of your post.