Almost Always Unsigned

kstenerud · on Jan 2, 2022

I've vascillated between one camp and the other over the years, but in 25 years of professional development, other than pointer arithmetic I can't think of a single time where I've dealt with numbers that exceed a trillion in either direction. Integer overflows are so rare that I can't even remember one happening with 64-bit values. Underflow on the other hand... I've encountered that plenty.

I'm now back in the signed camp. Signed integers make it a lot easier to sanity check integer subtraction when negative values are disallowed (a common situation), and it's easy to reason about. Also, not converting between signed and unsigned avoids accidental conversion overflows due to different expectations, so keeping it signed bypasses that whole can of worms while maintaining maximum flexibility.

Unsigned gives you 2x the positive numeric range, which 99.999% of the time you'll never use (and if you did, underflow detection would become a nightmare).

I can't think of a single time I've heard someone complain "I wish Java's long type supported up to 16 quintillion instead of a paltry 8 quintillion"

At the end of the day, code is for people, so keep it easy for people.

virtualritz · on Jan 2, 2022

I think types should express intend.

This adds more 'value' to the code for people than allowing them to switch their brains off because something is 'unlikely' to happen. A large class of bugs exists because people assume stuff is unlikely.

I also think one has to distinguish between API and actual code.

If you write a lib that takes inputs that should never be negative you can express this without writing a single line of documentation about it (or if you do, requiring the user to read it) by using unsigned types in these places.

You can still use signed types in your actual library code if you want/must/believe in that but do not pollute the outside API with them if unnecessary.

UncleMeat · on Jan 2, 2022

In most languages that offer this distinction, unsigned integers also don't really express your intent in an effective way.

Imagine you have a bug in your program. If you intend for a value to be non-negative and use a signed integer, you get a negative value. Your code is broken. You didn't want a negative value. But you do have have a very easy way of checking for this at runtime, at least.

Now imagine you instead use an unsigned integer. The bug is still there but now it manifests as overflow or underflow. Your program is still broken. Nothing about using an unsigned integer saved you. And worse, you don't have an easy predicate to check that something has gone wrong.

I think this can only be a reasonable choice if you are working in a language that has dynamic checks for overflow and so using an unsigned integer enforces not only the predicate that the value is nonnegative but also that all operations stay within its expected arithmetic domain.

joppy · on Jan 2, 2022

If my intent is that a certain integer should never be negative, labelling it unsigned does a poor job at conveying this intent because any arithmetic just silently does the wrong thing. 3-4 is some huge number rather than “error”.

tromp · on Jan 2, 2022

In Rust, 3u64 - 4u64 is "error", specifically

thread 'main' panicked at 'attempt to subtract with overflow'

Macha · on Jan 2, 2022

It panics _in debug builds_ but wraps in release builds because including the checks in every arithmetic operation is not zero cost.

masklinn · on Jan 2, 2022

Though you can configure your release builds to panic as well. That can be quite useful if you don't expect trivial computations to be a large cost center (and you can always bench to check that assumption).

Even more so now that custom profiles have been added to Cargo (yay), you can enable overflow checking in release, and add a `release-unchecked` or whatever.

tialaramex · on Jan 2, 2022

Or, better in many cases, you can write what you meant explicitly in Rust.

3u64.checked_sub(4u64).expect("Overflow");

This allows you to spell out that you don't expect this to overflow, and Rust should panic if it does, regardless of your compiler flags.

masklinn · on Jan 2, 2022

Of course you can do that, but I would say that doing so defensively for every operation is way more verbose and unwieldy than you'd want.

Much simpler to just toggle on the relevant flag.

eyelidlessness · on Jan 2, 2022

I think GP was saying you can do that when you expect that the general case doesn’t need to be checked but a specific case does.

masklinn · on Jan 2, 2022

> when you expect that the general case doesn’t need to be checked but a specific case does.

But surely that's almost never the case? Few programs actually desire or care for base 2 modular arithmetics, and when they do it tends to be for very specific tasks (usually cryptographic or cryptography-adjacent e.g. hashing, checksumming, ...).

eyelidlessness · on Jan 2, 2022

> But surely that's almost never the case?

Which is why it’s not so impractical to make an occasional exception.

deepsun · on Jan 3, 2022

I wish they had "unchecked_sub()" instead, and default "-" would work like checked_sub().

tialaramex · on Jan 3, 2022

unchecked_sub() exists in Nightly and so may some day come to stable

However, changing - to perform checked_sub() leaves you with an Option, and this makes ordinary arithmetic look pretty clumsy:

let x = (a-b).unwrap()-c;

The intent is to land more types with intrinsic behaviour. Today only the Wrapping type is provided, so Wrapping<i32> is an i32 that definitely has Wrapping arithmetic and won't panic on overflow, but eventually Saturating<i16> will be possible (e.g. for CD audio PCM samples, saturating arithmetic is correct, if you try to make the loudest possible noise louder it just stays the same) and so will Unchecked<i8> if you really sure that you're doing 8-bit arithmetic that can't overflow and doesn't need Debug checks.

raverbashing · on Jan 2, 2022

But then your function should check if the value is within reasonable parameters.

If it was an int at the function signature and you got -1 what would you expect to happen? It's the same thing.

marginalia_nu · on Jan 2, 2022

> But then your function should check if the value is within reasonable parameters

How could you possibly check this using types that cannot represent what you are checking for?

chromatin · on Jan 2, 2022

> How could you possibly check this using types that cannot represent what you are checking for?

`f(u32 a, u32 b) { assert a > b ... `

(if the hypothetical following operation is `a-b` as discussed higher in the thread)

pjmlp · on Jan 2, 2022

Better ensure that assert is not turned off in release mode then.

raverbashing · on Jan 3, 2022

One of the reasons I don't like asserts.

Log the error and act accordingly, instead of believing this kind of issue only happens during development

jstimpfle · on Jan 3, 2022

How do you "act" if you detected your logic is faulty? In many situations aborting is the best choice.

If input data is faulty and the reason for that is not seen as part of your logic, then sure, log an error and skip.

I used to define my own assert macro to "ensure" my asserts aren't disabled, but I don't bother anymore. There's nothing wrong with assert, and you needn't define NDEBUG. It's important to be aware that there are different situations, and not all warrant aborting, as described above. Another differentation is that there can be asserts that must be disabled in release builds for performance reasons, and others that won't affect performance and can stay enabled.

cesarb · on Jan 2, 2022

> because any arithmetic just silently does the wrong thing

The same thing, but worse, can happen with signed integers. (-3)-INT_MAX is either some huge positive number, or something crazy because it's undefined behavior and the compiler is allowed to do anything it wants.

dureuill · on Jan 2, 2022

I guess what the parent is saying is that, while the same kind of bugs can happen with signed integers, they are less common than with unsigned integers because they typically involve very big or very small values such as INT_MAX or INT_MIN. This is due to the fact that the "most common values" (close to 0) are at the beginning of the range of unsigned integers, but at the middle of the range of signed integers, so the risk of underflow is lower for signed integers in practice.

I can understand where they come from, especially regarding error handling: for integers whose value must not be negative, it is easier to check for underflow by checking if the result is not negative rather than by checking that the result is not eg smaller than the previous value (eg for addition).

That being said, "number must not be negative" really ought to be encoded in the type system, so that the user of the type knows that they must actually look for underflow.

I guess that part of the problem is that we don't want to check for underflow after each operation, so checking the negativity is a way to "coalesce" several checks after multiple operations. However this is fragile, because the multiple operations could end up producing a positive value, even if some intermediate values where negative.

For full safety, I don't see how we could do better than checking after each operation right now. If we're doing this, I feel like checked_add and friends from rust is a better fit than cramming an unsigned int into a signed one.

I wonder if we could design an integer type with 63 bits of value, plus one bit of "overflow/underflow poison", such that any operation that would under/overflow would saturate that bit to one, but otherwise still perform the operation on the value part. That would allow to coalesce multiple checks while keeping safety even in the presence of multiple faulty operations. I wonder how it could be implemented efficiently though

LegionMammal978 · on Jan 3, 2022

Your idea for a coalescing overflow type has been explored in the document N2868 [0], a proposal for the C programming language. Instead of reserving a bit from the integer, each integer type is associated with a checked integer type, which holds an integer alongside an overflow bit. Checked integers are opaque and can only be constructed and deconstructed via macros; as I understand it, the idea is that the compiler can use the processor's preexisting overflow flag to represent the checked integer. In particular, the operations on checked integers have the property of coalescing overflow checks that you are looking for.

[0] Supplemental Integer Safety, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2868.pdf

cesarb · on Jan 3, 2022

> I wonder if we could design an integer type with 63 bits of value, plus one bit of "overflow/underflow poison", such that any operation that would under/overflow would saturate that bit to one, but otherwise still perform the operation on the value part. That would allow to coalesce multiple checks while keeping safety even in the presence of multiple faulty operations. I wonder how it could be implemented efficiently though

There is something like that already for floating point. Whenever an overflow or underflow happens, it sets a sticky bit in a separate flags register. You can clear these flags, do a sequence of operations, and at the end, see if any of these flags are set. See https://man7.org/linux/man-pages/man3/fenv.3.html for the standard C API for it.

cle · on Jan 2, 2022

An unsigned integer will never be negative, that intent is expressed clearly and correctly. What the operational semantics should be is unclear, because of the tradeoffs involved in signaling an error.

In terms of the C standard, "3-4" for unsigned ints is modular arithmetic, and the "wrong thing" is assuming that it will do anything other than wrap around. This is very clearly defined, and implied whenever you see an arithmetic expression on unsigned integers.

kazinator · on Jan 2, 2022

Conversion from signed to unsigned is implicit and silent. The expression being converted to your "can never be negative" type isn't nonnegative!!! Its value drastically changes; e.g. -3 becomes a huge number.

snovv_crash · on Jan 2, 2022

Not if you pass -Wconversion to GCC

dagss · on Jan 2, 2022

Clearly defined, but very inconvenient and makes it too easy to write buggy code that looks correct on surface.

cle · on Jan 2, 2022

Inconvenient to whom? It’s pretty inconvenient not to do that if you eg have strict memory/performance requirements.

C made the right tradeoff IMO. You can protect yourself against overflow if you need to, but if it always signals errors you can’t turn that off.

willis936 · on Jan 2, 2022

This sounds like an issue with the language. 3-4 is not a large number and should not be by default. It should be a case handled by the developer.

Maybe you want to do some weird bit banging trickery, and that's fine too, but it should require an out-of-the-way function call.

dragonwriter · on Jan 2, 2022

> This sounds like an issue with the language. 3-4 is not a large number and should not be by default

Many languages optimize numeric operations and DX around numeric types for efficiency rather than correctness, and make the latter high-friction. (Wrap-around subtraction, decimal literals being treated as IEEE floats, etc.)

The exceptions (or cases where it is merely less true) tend to be very high level, dynamic languages.

Avshalom · on Jan 2, 2022

"unsigned" doesn't express intent. It expresses implementation, maybe it expresses assumption.

Ada's "Positive" or "Natural" or

  subtype X is Integer range Y..Z;

expresses intent (well expresses requirement really), it also enforces that constraint, so it's not merely an assumption.

jstimpfle · on Jan 2, 2022

> I think types should express intent. > If you write a lib that takes inputs that should never be negative you can express this without writing a single line of documentation

So, let's make an odd_int_less_than_20587 type and a billion of other types? "Not negative" normally helps only one tiny bit towards documenting anything about the accepted values. Making a separate type for that and complicating everything is ridiculous.

> If you write a lib that takes inputs that should never be negative

Have you ever compared the difference between two unsigned integers? Because unsigned integers don't have "natural" subtraction. Did you ever find the need for signed numbers to interact with unsigned numbers? Because if you did, it's a bad idea to separate them without need. Premature isolation is a prime cause of complexity.

jcelerier · on Jan 2, 2022

> So, let's make an odd_int_less_than_20587 type and a billion of other types?

Yes, what's wrong with that ? Every variable's type should be defined as precisely as possible when meaningful, that's the whole point (and C/C++ not easily allowing to do that is one of their biggest drawbacks imho.)

cle · on Jan 2, 2022

There's a tradeoff around strictness of nominal types and evolvability, e.g. changing the threshold on such a strict type would be a breaking change.

GrumpySloth · on Jan 2, 2022

Changing the threshold without signaling the change in the type system is also a breaking change. It's just that your compiler won't warn you about it when you inevitably forget to update your code. Instead your code is going to break at runtime.

jstimpfle · on Jan 2, 2022

Obviously if you have at most 8/16/32/64 bit types (only making new types because of physical constraints), you'll have way, way fewer breaking changes than if you try to encode bounds extremely tightly and try to update them whenever you 2 instead of 1 somewhere.

In fact most software is designed such that the physical sizes are chosen first, and then the practical bounds follow from that.

cle · on Jan 2, 2022

Not necessarily, it depends on what you do with the value.

jstimpfle · on Jan 2, 2022

Are you seriously asking? Because what I suggested rhetorically would be the death of all your programs for the simple cause that line count would quadruple and you would have to touch half your codebase every time you make a tiny change.

jcelerier · on Jan 2, 2022

When I make a change today like renaming a method I just let my IDE apply it to the codebase, why would that be different ?

jstimpfle · on Jan 2, 2022

Because if you change one type to include the number 235498 it will also affect 377 other types.

jcelerier · on Jan 2, 2022

why would it ? if you set things such that expected ranges are set where they matter (e.g. where you do arithmetic) you should only have to change the types for which range issues occur, like this (very quick and dirty, likely wrong, but you get the idea): https://gcc.godbolt.org/z/4v6dW9czP

jstimpfle · on Jan 2, 2022

Not to hurt anyone's feelings but no I don't get the idea. I find this irritating and very unconvincing. So much work and boilerplate to prove peanuts. Have you ever seen anyone doing serious software development like that? I mean, not just toy examples of some 10-line bean counting program blown up to 100 lines.

> void foo(ranged_int<1, 5> a, ranged_int<3, 7> b)

Why would I _ever_ write a function like that? Practical proofs need to be about runtime values, not just constants and types. They also need to put multiple values in relation. They need to handle type casts and many more things. Constant integer ranges might be of some use but it's a tiny fraction of the invariants that a programmer juggles in practice, and the source code is already blown up out of proportion just for these ranges. This thing is just not gonna fly.

Just multiply some "good" values a few times and there is no way that you can statically prove that the result is going to be in the range as well, even though it might always be the case due to how the values combine each time (simple example: offset + size that the code checked to be within array bounds). Invariants in practice are relations over the runtime values of multiple variables. The focus on single values is basically already where types fail as a correctness tool.

jcelerier · on Jan 3, 2022

> Just multiply some "good" values a few times and there is no way that you can statically prove that the result is going to be in the range as well, even though it might always be the case due to how the values combine each time (simple example: offset + size that the code checked to be within array bounds).

I don't understand, there's literally a CVE about that elsewhere in this thread, basically

    struct my_structure { 
      int size;
      int offset;
    };

either you restrict both size and offset to e.g. 2^30 so that you can safely sum both, or you will have an issue because you'll be doing if(size + offset < max_size) and size + offset overflows because absolutely no one can be trusted to write overflow checks, you gotta use a library for that.

jstimpfle · on Jan 3, 2022

If you're not making assumptions about the values stored in your variables, then every little arithmetic operation can lead to unexpected wraps (UB for signed types). What follows is that we need to make assumptions. Don't write an if conditional in front of every operation like a forgetful person - but assume that "my_structure" contains good values that you can safely add.

Usually it's very easy - require that size and offset are validly indexing an object, and require that objects are of a sane size (I rarely have in-memory objects larger than a few megabytes, and for most code I posit that it is a mistake to not chunk and stream huge datasets). Done, no real need to worry.

If you cannot make any assumptions about the values contained in a "my_structure" then that means the code is at an API boundary, which is the perfect place where such check (if-statement) must be placed. At internal code places, checks are wasted effort, and at most a compiler switch to detect wraps (logic errors) would often be a much better choice than paranoid code where each line of code can't trust the previous line, and where one can't see actual work being done because it's all covered in mistrust. Unnecessary mistrust (i.e. mistrust at places strictly inside maintenance boundaries which are not strategic checkpoints) only makes it hard to write code that actually does something, and might possibly cause more bugs than it could ever save.

morelisp · on Jan 2, 2022

> So, let's make an odd_int_less_than_20587 type and a billion of other types?

Bounded integer types have been extremely useful for me on a few occassions, and I wish more languages had type systems powerful enough to support them. They also address your concerns about subtraction.

masklinn · on Jan 2, 2022

> So, let's make an odd_int_less_than_20587 type and a billion of other types?

Yees?

Bounded integers are a relatively common feature of programming languages (and one which is sorely missing from Rust, it's technically possible currently but not super fun, when fleshed out enough const generics should make them much nicer, possibly even built-in)

jstimpfle · on Jan 2, 2022

What is the point of bounded integers? Most quantities don't have natural bounds, and practical bounds only arise from physical limitations (8/16/32/64 bit etc).

atq2119 · on Jan 2, 2022

Those practical bounds have a tendency to percolate into bounding other things. So you can only have 2**10 foos because they're indexed by a 10-bit bitfield somewhere? Now maybe you can only have 204 bars, because every bar has at least 5 foos. And so on.

If you genuinely wanted to reason solidly about such limits, you do end up needing arbitrarily bounded integers. Of course, most people don't want to do that, for good reasons. And even if you wanted it, there's a reasonable question whether adding them to the language syntax is really the right thing to do.

jstimpfle · on Jan 2, 2022

Yup, that kinda was my point, and where you say "at least" that is also a good point why setting up some complicated static system to maintain limits is not practical. Hence my philosophy has been for a long time to set the physical constraints that just have to be decided (and do that as best as I can). And then to not worry (more than a tiny bit) about statically deciding the rest, or it'll end in a nightmare.

jandrewrogers · on Jan 2, 2022

> So, let's make an odd_int_less_than_20587 type and a billion of other types?

Yes, and at least in C++ it isn't that difficult to trivially generate these types as you need them with a modicum of template-fu. You don't need that many integer type templates to capture most of the common cases and provide contextually sane operators. This might be a little more difficult in other languages.

This is pretty standard type safety practice and usually well worth it in terms of writing robust low-defect code.

jstimpfle · on Jan 3, 2022

Show me where this is standard safety practice?

jandrewrogers · on Jan 3, 2022

Eh, what? This has been idiomatic for many years for C++. Not doing it, especially today, is just malpractice. Every company I've worked at since I was doing C++ professionally required this. There is no accounting for how random companies do their software engineering but that doesn't mean competent software engineering organizations don't exist.

I do database engines. This has been a standard part of code standards for a really long time, at least as long as the programming languages we used practically supported it. I've done it ever since. The only type that is allowed to flow through the code is size_t, where it makes sense.

jstimpfle · on Jan 3, 2022

I'm really curious what that means and how it looks. I assume it's casually going beyond simple typedefs for types whose size might change? And what does "flow through the code" mean here? That you can't mention "int" or "uint64_t" anywhere? That you can't pass these things as arguments?

Are there any open source examples to study this style? I don't know where to look, but admittedly studying C++ codebases is not what I do for fun.

In my personal experience, it never worked putting layers of abstractions to hide physical representations. That makes the implementation code really hard to grok, and your hands are bound in the code because you're not allowed to _assume_. What worked for me instead is making sure that certain decisions that are likely to change are contained inside a module and for the rest of the code hidden behind an API or simply not accessible. Runtime assertions can be useful too, but maybe that is part of what you are describing by "template-foo". Finally, I try to stay true to my decisions and am not trying to act like I could have to replace floats by doubles at every corner. Some code that I've seen was littered with templates for that single reason, and I'm pretty sure the developers didn't have an intimate understanding how floating points works.

Interestingly, size_t is one of the things that I tend to avoid, and in my perception a majority of the programmer population agrees it was a mistake to have it be an unsigned type. A type which can represent "the size of an object" is too general for all but maybe OS APIs IMO. Usually I know that the size of my objects is bounded by a certain size, well below what fits in 32 bits.

JD557 · on Jan 2, 2022

> So, let's make an odd_int_less_than_20587 type and a billion of other types?

I see your point, but I think that (while not always), it can still be pretty helpful to add your custom types to make invalid states unrepresentable.

This can sound ridiculous on something like C, but if your language has refined types, it's not that bad.

For example, this doesn't look awful to me:

``` type WeirdId = Int Refined (Odd And Less[20587])

val myId: WeirdId = 5 // OK val myInt: Int = myId // Can be used as an Int val myRuntimeId: Either[String, WeirdId] = refineV[Odd And Less[20587]](myId + 2) // Runtime Check val myInvalidId: WeirdId = 6 // Compilation Error ```

https://scastie.scala-lang.org/qoSHgL4PQCW6lC2MUHc5YA

jstimpfle · on Jan 2, 2022

I've spent some time pondering things like that, but ultimately concluded that we should keep these things away because there is a duplication between type-level and code level, and I'm completely fine with sprinkling a few runtime assertions in most cases. Normal code syntax is much more suited to expressing invariants. Obviously I'm not into statically proving the correctness of my programs, so if that's your cup of tea things are probably looking different (and I'm sorry for you).

atq2119 · on Jan 2, 2022

I would love to be able to do more formal proving of programs in practice, but I 100% agree with you. A lot of what the advanced typing crowd seem to be doing is to re-invent programming but at the type level, and it's unclear to me why that should count as progress. It reminds me uncomfortably of C++ template metaprogramming -- and for good reasons, C++ has made steps towards replacing that with statically evaluated expressions written in regular syntax.

My gut feeling is that the right way forward is to have a fairly standard type system visible in the source language augmented by contracts written in regular code. Basically a slight augmentation of assert(). A formal prover could internally reason about an extended type system where those contracts are automatically lifted to the type level if that helps for some reason, but programmers wouldn't be bothered with it.

This seems unlikely to be a genuinely novel idea, so I'd be curious to hear whether systems like that already exist.

captainmuon · on Jan 2, 2022

Exactly. It pains me when someone comes up with a new language with dependent types, but then to encode a simple constraint they first define integers via peano axioms using some kind of template metaprogramming. (The example I'm remembering was Coq or a Haskell dialect or something.)

There was a dialect of C#, Sing# or Spec#, that allowed you to specifying pre and post conditions in the same expression language as the main language, and they were checked at compile time. This is what comes closest I think.

mamcx · on Jan 2, 2022

Totally!

This is one of the best features of the pascal family:

http://www.delphibasics.co.uk/Article.asp?Name=Sets

(look in "SubRanges")

jstimpfle · on Jan 2, 2022

Almost useless in my opinion (because not a lot of practical invariants can be expressed with simple ranges) and will only create confusion when using non-0, non-1 based indexing and also create confusion about the representation/the space required for representation. The examples shown there are similar to the Animal-Cat-Dog examples that are popular to "prove" that classes and inheritance are useful.

That, plus the type incompatibility that arises from mixing such types with other integer types.

mamcx · on Jan 2, 2022

Do you have uses Pascal?

The ones that used it like it...

jstimpfle · on Jan 2, 2022

I have worked on an enterprise application written in Delphi full-time for 6 months.

tel · on Jan 2, 2022

Not every type pays it’s way, but the distinction between N and Z is well worn and useful. N measures the size of things while Z is a little more abstract. I think u64 can quickly convey a powerful intent.

jstimpfle · on Jan 2, 2022

If we're talking about N and Z (which makes sense since most programs are grounded in a mathematical model with some physical constraints added on top), then please acknowledge that N doesn't have subtraction defined, which illustrates why it's not very useful for most code.

You could also argue that subtraction for unsigned numbers is only partially defined, just like for signed numbers. But that would be missing the point, since most numbers in practical programs are small, and for these numbers the signed range is indeed much more useful since subtraction is defined for signed integers (e.g. i32) for small numbers. This is not the case for unsigned, where 3u-4u is usually not what you want.

tel · on Jan 6, 2022

Even without subtraction defined, N is useful. Partial subtraction on N, as it is, is also useful. When I'm dealing with counts I pretty explicitly either want truncated subtraction or warnings when I overflow.

Perhaps the biggest issue all together is just that many languages will let you do 3u - 4u instead of outright making the syntax more obviously annoying to force you to think about the caveats and pick between {truncated, throwing, overflow} variants.

virtualritz · on Jan 2, 2022

I think you assume my comments are regarding C/C++.

They are not. I write mostly Rust these days.

jstimpfle · on Jan 3, 2022

I don't really care, why would Rust be much different in that regard?

whattheactualf3 · on Jan 2, 2022

It does not express intent. I remember being a junior C programmer and using "unsigned" to express my intent (it's greater than or equal to zero).

In reality, like all C keywords, it's completely misleading and actually really means "weird_arithmetic_mode_that_will_mess_you_up".

carlmr · on Jan 2, 2022

I don't know what's misleading about it or how the arithmetic is weird?

SAI_Peregrinus · on Jan 2, 2022

Signed and unsigned integer types both are finite fields, rather than Abelian rings like the normal integers used in most math. That difference makes the arithmetic "weird", but going to signed integers doesn't change anything, anything except bigints (or generic arbitrary precision floating point) is a form of "weird_arithmetic_mode_that_will_mess_you_up" if you pretend you're operating over the real numbers.

Not sure what they mean about it being misleading though. They might prefer it if they were called "modular integers" instead of "unsigned integers" but that would lead to confusion with modules IMO.

carlmr · on Jan 10, 2022

>Not sure what they mean about it being misleading though. They might prefer it if they were called "modular integers" instead of "unsigned integers" but that would lead to confusion with modules IMO.

Modular integers to me would seem very confusing. They're integers without the sign, and they have some max value determined by the number of bits, I can't really think of a better name for them except for the fixed-width versions which include that bit information.

Extremely intuitive, simple English words, exactly what it says it is (integer without the sign bit).

b3morales · on Jan 2, 2022

Imagine a parallel situation. There is a type `NonEmptySet`, with an `intersection` operation. Now, given `NonEmptySet(1, 2).intersection(NonEmptySet(3, 4))`, and the requirement that `intersection` produces another `NonEmptySet` what does that return? How about a set containing everything except 1, 2, 3, 4? Does that seem reasonable? That's analagous to what `(unsigned)3 - (unsigned)4` does.

In short, the operations on `unsigned int` do not really behave like "non-negative integer"; as such it's a weird match to use them for that purpose.

carlmr · on Jan 10, 2022

I don't really see it, unsigned says there's no sign. Given that and the number of bits it's exactly the expected behavior that it will not be able to produce a sensible result for subtracting something larger from something smaller.

Whether the operation then errors out or wraps doesn't change anything about unsigned being the right name for it, because it's something you wouldn't expect unsigned to do.

The intersection of your two NonEmptySet produces an empty set. If you write that result into a NonEmptySet it's the same, it doesn't make sense for you to expect a NonEmptySet as a result of that operation.

kstenerud · on Jan 2, 2022

In a perfect world, this would be so. And if there were types that were performant AND could enforce such invariants without the downsides, I'd welcome them gladly. But primitive types come with baggage and limitations, so you have to make compromises. And the compromises when using unsigned ints are just too costly in my opinion. They're a good tool for specific use cases, but not as good as signed for general purpose code.

virtualritz · on Jan 2, 2022

Could you list some of the downsides & compromises in the case I made?

kstenerud · on Jan 2, 2022

- silent unintended underflow behavior

- dangers of mixed signed/unsigned code (signed values MUST exist, but unsigned values don't have to)

- more cognitive load to read and understand unsigned underflow detection code + issues once you exceed the halfway point

This is why Java offered signed integer types only (I think they should have allowed unsigned as well, but for different reasons)

virtualritz · on Jan 2, 2022

> silent unintended underflow behavior

Please show me an example in the wild that could not be trivially caught/alleviated. With 'trivially' I mean the code that needs to be written to prevent it, not the thinking you need to do to consider it.

As for the latter: this is what I get paid for. Writing the code out is necessary but not what I really get paid for (unless I were a very slow typist, that is).

I find that most of the time I was simply too lazy. Or rather: I wrote C/C++ code for 20+ years. I used int and didn't think (unintended rhyme).

Now I write (mostly) Rust I find myself considering the edge cases regularly and also covering them because the compiler forces me to. See also this comment [1].

I.e. a - 1 + b where a is always positive (incl. 0) and b is always > 1 'just' works with int (and unsigned [overflow] too!) in C/C++.

In Rust you will get a panic when a = 0 which will then make you think what you're doing and simply reorder this to a + b - 1.

I find that this is the 'cognitive load' that needs to be applied commonly and I'm pretty fine with that.

> dangers of mixed signed/unsigned code

Alleviated by simply not mixing. There is a reason languages like Rust simply do not allow this without explicit typecasts. And those, in cases where one type is signed and the other is not and they have equal width translate to "all bets are off".

> more cognitive load to read and understand unsigned underflow detection code [...]

See first point. IMHO this is simply the same point expressed differently. Underflow happens when you subtract.

Ensure you do not subtract a larger value from a smaller. This is as easy as writing max(a, b) - min(a, b) or what author of [1] does.

Same as abs(a - b) but works with unsigned. Cognitive load? I do not see any and I agree with the author of the article that this is easier to read.

[1] https://news.ycombinator.com/item?id=29767877

kstenerud · on Jan 2, 2022

This is one of those "If everyone would just do X" kinds of situations, which is why I'm against it. Things go wrong all the time - in fact it's a miracle anything works at all. So when I see "code that needs to be written to prevent it", I let out a sardonic laugh; people just don't work that way.

People WILL forget to check, or implement the check wrong (and it won't be discovered until the next black swan event), or some invariant will change in future and nobody will know to update this code. The code doesn't care because the compiler doesn't enforce the invariants, so you're left with bare and likely confusing (and probably eventually undefined) behavior.

I for one would LOVE it if popular languages could be instructed to inject invariant enforcement code (not asserts) at certain boundaries. It would help with a whole host of problems!

When you cross zero with a signed integer calculation, it's easy to find out, and the code has low cognitive overhead. Also, with code generally structured this way, bad input propagation tends to get stopped earlier because of the number of functions that check against negative values.

It's also possible to underflow check with unsigned, but the code is more complicated and can't stop propagation if it fails, and if you are allowing values > the signed positive limit (or unwittingly allowing it), your overflow detection code gets even more complicated (or wrong) due to the reduced overhead space. You COULD do a bunch of checks beforehand, but that's not always feasible due to the combination of operations being done on the operands, and once again people are people.

Even though many modern languages block uncast type conversions, that still doesn't protect you from converting a negative integer to an unsigned integer. Yes, you SHOULD be checking before the cast, buuuuuuut...

What we want to do is reduce the number of ways things can go wrong, reduce the fallout of an errant process, and minimize cognitive load to reduce bugs overall. Making that worse just to double the positive integer space doesn't make sense to me.

whattheactualf3 · on Jan 2, 2022

Here's an example:

  unsigned int length_a = 10;
  unsigned int length_b = 20;

  if (length_a - length_b < 0)
  {
      printf("negative diff\n");
  }
  else
  {
      printf("positive diff\n");
  }

I would argue that there is plenty of code like that in the wild. It looks innocent and harmless but it gives a completely surprising result.

munch117 · on Jan 2, 2022

If you ignore the compiler warning. This is trivial for the compiler to detect.

jcelerier · on Jan 2, 2022

only in the most trivial cases

https://gcc.godbolt.org/z/YWPP5qdbx

captainmuon · on Jan 2, 2022

I don't understand why the programming language can't promote the temporary to a larger signed int. Of course you can't change it in C anymore, but its like this in most languages. But you rarely ever want unsigned modulo arithmetic here (and if you do it should be a special syntax).

virtualritz · on Jan 2, 2022

-Werror

CamperBob2 · on Jan 2, 2022

"This value should never be negative, so I'll use an unsigned type." There are two types of programmers: ones who are reassured by that reasoning, and ones who shudder at it.

imtringued · on Jan 3, 2022

Yeah, "should not" and "can not" aren't the same at all.

kazinator · on Jan 2, 2022

In C or C++, if you write a function void api(unsigned), you can cheerfully call it as api(-42) or api(b - c) where b - c are int.

This is a poor, poor substitute for contract checking.

8note · on Jan 2, 2022

Types aren't always expressive enough for that. I usually can't specify that a number should always be negative for instance, or that there's no 0.

My preference is to minimize surprise. If everyone is used to using signed types for everything, you better have a good reason for using something else, and leave that justification in a comment

marcosdumay · on Jan 2, 2022

> I think types should express intend.

Yes, and let's start with overflow behavior because it's much more relevant than the type domain.

If overflow isn't encoded by the type, there is just no way to correctly set a strict domain.

barrkel · on Jan 2, 2022

What you want is range subtypes of signed integers.

kazinator · on Jan 2, 2022

Important: without silent conversion. Languages like Pascal and Ada require explicit conversions in more places to make that sort of thing useful.

pyjarrett · on Jan 2, 2022

It also allows creating type checked "new" numerical types for semantics purposes to help improve interfaces. This helps prevent things like accidentally passing an integer in the wrong units for example. Two of the worst offenders I've seen of this is: "Is this uint64_t (or float) "dt" parameter in ticks, seconds, milliseconds, microseconds or nanoseconds?" and "Is this angle parameter in radians or degrees?"

alerighi · on Jan 2, 2022

And why also limit to that? What if I have a function that accepts values in the range [0, 100]? Make a type to express that? What if I have a function that takes only positive integers (i.e. unsigned integers minus 0)? Chances are that you should anyway validate your input. Also what if you have a function that takes an unsigned and a signed is passed? On most situations you only get a warning (or not even that) and it's passed a completely wrong input to the function. Isn't better to take an int and validate it before doing anything with that?

You see, doesn't make a lot of sense to distinguish between positive and negative in the end. From a mathematical point of view negative integers have the same dignity of positive integers since a couple of centuries. The same operations that apply on positive integers apply on negative integers, and not only that, an operation between two positive integers can as well result in a negative integer!

The unsigned type was created just to have one more bit if you *are sure* you have only positive integers. That could had a sense in the era of 8/16 computers. Nowadays that we have 32 or 64 bit computers does really that extra bit count that much? I don't think so... 2 billions is plenty enough for most usages.

spacechild1 · on Jan 2, 2022

> Integer overflows are so rare that I can't even remember one happening with 64-bit values. Underflow on the other hand... I've encountered that plenty.

I totally agree!

After some horrible bugs caused by unsigned integers in innocent looking C/C++ code, I'm 100% in the signed camp. The only time I use unsigned is when I need well-defined overflow behavior (which is rare).

As a bonus, using signed loop counters in C/C++ code enables certain kinds of compiler optimizations (based on the fact that signed integer overflow is undefined behavior): http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

cwzwarich · on Jan 2, 2022

The loop optimization you point to can be done for unsigned loop counters even without exploiting undefined behavior. It’s part not why the standard C practice was to use unsigned loop counters.

StefanKarpinski · on Jan 2, 2022

Fully agree with this, although one case where Java’s lack of unsigned integers is a quite bad is in representing binary data because the byte type is signed. I recently had cause to write some Java code that did a lot of modular arithmetic with arrays of bytes and oof, what a mess. There were four or five subtle bugs caused by the unexpected signedness of the byte type. I also wrote the same thing in Julia using UInt8 vectors and had no such bugs.

xxs · on Jan 2, 2022

Java technically doesn't even have non-array bytes (or short), all types are just that integers: either 4 or 8 bytes. So if you see a declaration of byte/short (and often times char) is likely a mistake.

The correct way is using ints, e.g. "int x = b[i] & 0xff;". Pretty much you should do math with int and long only. java.util.Integer/Long nowadays have static methods to work with unsigned types if you need them.

Once you remember that you need (the annoying) "0xff", you'll be free from errors. It has been there for 24 or so years. Other than that wrap your arrays into ByteBuffer and use provided utility.

charleslmunger · on Jan 2, 2022

Citation needed? Java primitive fields are padded for alignment, but that's not unique to java.

https://www.baeldung.com/java-memory-layout#layout

xxs · on Jan 2, 2022

It's pretty much a part of the bytecode spec [0]. The stack model has only 4 or 8 bytes long operands. Yes, it can convert int to byte, as method signitures actually need the cast. However a operation involving two bytes (say add, sub, shift) would result into an int, and requires a downcast back to byte. "b+=anotherByte" sums b and anotherByte and then downcasts it automatically.

The memory layout of the classes is quite immaterial here as the operation do happen within CPU registers.

[0]: https://docs.oracle.com/javase/specs/jvms/se12/html/index.ht... [1]: https://en.wikipedia.org/wiki/List_of_Java_bytecode_instruct...

charleslmunger · on Jan 3, 2022

This is true, but if your runtime is actually running stack-based interpretation of your java code (rathe than JIT) you have bigger performance problems.

    static int square(int);
       0: iload_0
       1: iload_0
       2: imul
       3: ireturn

is java; compare to the c++ equivalent (used unsigned to avoid any possibility of undefined behavior):

    unsigned char square(unsigned char num) {
      return num * num;
    }

Which compiles to:

        mov     eax, edi
        imul    eax, edi
        ret

on x86

        mul     w0, w0, w0
        ret

on arm.

By this definition, c and c++ don't have "real" byte types either, because they undergo integer promotion for operations, and the generated code is operating on register sized values, not bytes. Java bytes use 8 bits as part of an object (with a fixed-cost padding); 8 bits as part of an array, etc. The fact that the bytecode has intermediate casts is not relevant to the actual code executed on CPU.

masklinn · on Jan 3, 2022

> By this definition, c and c++ don't have "real" byte types either, because they undergo integer promotion for operations, and the generated code is operating on register sized values

They're not actually operating on register-sized values: the code labelled x86 is also the x64 code, meaning GCC operates on 32b when the native registers are 64b. Likewise ARM.

Your assertion is even more debatable when Clang actually operates on 8-bit registers on x64:

        mov     eax, edi
        mul     al
        ret

(it does not on ARM64 as `mul` is only defined for 32 and 64b)

jstimpfle · on Jan 2, 2022

I don't write Java, but a good way to deal with this might be to cast to int (a larger type) immediately after loading a byte and before working with it. So, complexity-wise, the overhead of signed byte vs unsigned byte should be at most two casts, and often casting to a larger type would be required anyway even for operations with unsigned byte values.

masklinn · on Jan 2, 2022

> I can't think of a single time I've heard someone complain "I wish Java's long type supported up to 16 quintillion instead of a paltry 8 quintillion"

That Java has no support for unsigned integers is a very common complaint.

marginalia_nu · on Jan 2, 2022

It's one of my biggest complaints attempting to do systems programming in the language, that and the lack of 64bit mmap.

xxs · on Jan 2, 2022

I have no issue with the lack of unsigned - that's ok. It's a trivial workaround, as it's rare enough. However the lack of 64bit indices over memory mapped files blows indeed, with no easy (performant) replacement. It's doable but it'd require a shim over an array of ByteBuffer with the double indirection and pretty much inability for the JIT to optimize away all the bound checks.

hashmash · on Jan 2, 2022

I've had to resort to rolling my own mapping layer using JNA. Fortunately, 64-bit mmap support is being added as part of the Panama foreign-memory project.

berkut · on Jan 2, 2022

I've hit the 2147483647 limit of indices occasionally in VFX/Graphics with large datasets: memory efficiency was also important, so 32-bit was a necessity (random access is required, so can't dynamically pack), so I tend to use unsigned these days (uint32_t) within arrays, but native 64-bit lengths (normally size_t) as standard local variables, and I normally go with unsigned there as well.

Several APIs in the VFX arena (Pixar OpenSubD, Pixar USD, and Foundry's Katana GeoLib API) all use signed ints for indices which I've occasionally argued against, with the response being "Google's coding standard says don't use them".

kstenerud · on Jan 2, 2022

Yup, and that's a case of an API not aging well. But even in that case, switching to unsigned is only kicking the can down the road, not solving the problem. You'd get another year or two and then hit the 4294967295 limit.

As for 64-bit values, unless you can foresee requiring > 8 quintillion, unsigned isn't buying you anything.

mjevans · on Jan 2, 2022

Resource consumption does matter though. For example CPU caches or the quantity of RAM within a system. If their workset does fit within 32 bits of index the intentional design choice very likely correlates to a measurable improvement in speed due to cache locality and system resource utilization.

kstenerud · on Jan 2, 2022

Indeed it does matter! As I said this is an example of an API that hasn't aged well (when they devised it, they probably didn't envision this happening).

There are a number of ways to handle this:

- Use unsigned 32-bit (gives you a little bit more runway, but you'll still hit the end hard)

- Use 64-bit values (ends your problem once and for all, but costs double the memory so now you have a new problem)

- Use 40-bit values (Gives you a lot more runway, but costs CPU)

virtualritz · on Jan 2, 2022

TBH I think this is pure laziness.

I looked at OpenSubdiv extensively. There are no good reasons to use signed in the cases I opened the ticket for and making sure the code inside the lib does the right thing with unsigned arithmetic is trivial.

But someone still needs to do that and writing that "Google (essentially) says you don't need to" in their style guide is much less work. ;-)

kevin_thibedeau · on Jan 2, 2022

It isn't about squeezing out extra range. A silent overflow mechanism with predictable behavior is useful and can't be cheaply replicated with signed numbers.

Timestamp arithmetic is a common case where unsigned integers with silent overflow is superior. You can avoid bounds checks and special cases by always computing a delta between a newer time and an earlier time. As long as you can guarantee that the sample period is less than the clock rollover period there will never be any ambiguity.

Ada tried imposing the purity of signed integers and had to add modular types to get these semantics because of their utility.

tored · on Jan 2, 2022

Actually I had a colleague that always complained to me that Java didn't have unsigned types. He was of course a C++ programmer.

xxs · on Jan 2, 2022

c'mon java does have an unsigned type, just that it's 16bit only (char)

pjmlp · on Jan 2, 2022

> For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

http://www.gotw.ca/publications/c_family_interview.htm

Gosling on why Java only has signed arithmetic.

dc-programmer · on Jan 2, 2022

I’ve seen projects that use a mix of unsigned int, size_t, and off_t variable declarations with implicit conversions between them. This seems like a security hazard if you are handling user input because just checking the original input may be insufficient. size_t in general seems extra ridiculous because max(size_t) is a common sentinel value returned by methods and if you aren’t careful and add to max(size_t) it wraps around to -1.

colanderman · on Jan 2, 2022

A bonus is that you can enable UBSAN to automatically and efficiently catch signed integer overflow, without changing language semantics (since, unlike unsigned overflow, signed overflow is undefined in C/C++). Here's an earlier post of mine showing how: https://news.ycombinator.com/item?id=24578534

xondono · on Jan 2, 2022

A lot of processors around you are still 16 and 32 bits. I’ve hit the overflow a lot too.

> At the end of the day, code is for people, so keep it easy for people.

This is a dangerous heuristic, because people here means exclusively developers. It’s easy for us to trade annoying but mostly harmless bugs for rare but harmful ones.

kstenerud · on Jan 2, 2022

Yes, I hit it a lot during my Z80 days, too. I also had to do manual memory management using a memory-mapped page register. We all have our constraints we must operate under, and a bag of tools to help deal with those constraints.

But "unsigned by default" won't save you here; all it does is give you a little more breathing room for your special case code (and hey, maybe that doubled positive integer space is all you needed, in which case congrats).

But all of the other problems are still there. That's why I'm saying that "unsigned by default" is backwards. It should be "signed by default", and unsigned for special situations that warrant it.

cma · on Jan 2, 2022

> I can't think of a single time where I've dealt with numbers that exceed a trillion in either direction.

A big one is dealing with currencies like Japanese yen in financial software.

kstenerud · on Jan 3, 2022

Absolutely! But how about a quintillion? Signed 64-bit integers give you 8 quintillion in the positive integer space, and unsigned gives 16 quintillion.

The question is: Will the difference between 8 and 16 quintillion matter enough to say "unsigned by default"?

But currency types need to be signed anyway, so I guess that's that ;-)

jcelerier · on Jan 2, 2022

Unsigned does not mean "not negative", it means "modular arithmetic".

The only few cases you want that are e.g. hash functions, crypto, etc. In all the other cases it's a mistake, and the "patterns" shown in this article to circumvent the issues with unsigned just for the sake of using it are extremely ridiculous I think and much less readable than the normal code using signed integers.

If you have things that must never be negative, define a type that does that and gives an error (there are many good safe_int examples in C++) whenever an operation gives a negative number as in that case you've already lost and your business logic / input filtering / .... is wrong and needs to be fixed.

Anecdotally, I've written a few hundred thousand kloc of c++ so far and I've never ever had a bug due to signed overflow. Unsigned underflow OTOH... I'm just thankful for clang's ubsan to warn on it because of how many issues it caught. For the immense majority of programs you'll never have sizes close to 63 bits anyways since the CPUs we use barely have 52-bits of address space at most (and if you have more you're likely already using 128 bit sizes)

> Where unsigned does benefit here is when these are used as indices into an array. The signed behavior will almost certainly produce invalid indices which leads to memory unsafety issues. The unsigned way will never do that, it’ll stay bounded, even if the index it produces is actually wrong.

I'd take negative indice over silently wrong indice any day of the week. The first will be caught hyper quickly, the second will send money to the wrong account silently for a couple years before anyone notices and then you're in much deeper issues.

hamburglar · on Jan 2, 2022

The suggestion of using an unsigned loop counter starting at size-1 and decrementing until “i<size” is no longer true starts this article off with a “this is bad advice” flavor to it in my book. If I encountered that code I’d stare at it way longer than necessary because I’d be trying to figure out what cleverness is hidden in the bizarre/backwards way the loop continuation test is written. I don’t know why you’d do this to someone else who has to read your code.

xxs · on Jan 2, 2022

It's an idiom - obviously very little known, likely less known than comparing floats to themselves being a NaN check.

The standard idiom for doing a reverse loop would be: "for (int i = size; i-- > 0; ) ...". As a side benefit it has lower registry pressure. Some folks have issues reading that one as well.

Saying all that not being 'unsigned fan', just the popularity of the idioms comes with their use frequency, and given that 'unsigned; is not popular at all...

reallybigballs · on Jan 2, 2022

reverse iteration seems pretty basic …

ghusbands · on Jan 2, 2022

> Unsigned does not mean "not negative", it means "modular arithmetic".

In which case it has a terrible name. The name clearly indicates that the primary feature is that it is not signed, and the logical conclusion from that is that it is not negative.

That said, I agree; using signed types in C/C++ does generally seem safer and a majority of the techniques listed in the article are unsafe, logically confused, and/or inhibit optimisation. Using signed types in languages like C# is also generally more pleasant, because otherwise you end up with a lot of casts just to interact with the base types (like arrays), language features and third-party libraries.

Noughmad · on Jan 2, 2022

> The name clearly indicates that the primary feature is that it is not signed, and the logical conclusion from that is that it is not negative.

No, the logical conclusion is that since it has no sign, you don't know whether it's positive or negative. Otherwise the name would be "positive" or "non-negative", not "unsigned".

And this conclusion would also be correct.

SAI_Peregrinus · on Jan 2, 2022

Signed also tends to mean modular arithmetic. It's just with a different definition of modular arithmetic than the Euclidean one. See Boute's 1992 paper "The Euclidean Definition of the Functions div and mod" [1] for details on the various definitions.

Both signed and unsigned arithmetic are using finite fields, instead of Abelian rings as most people expect.

[1] https://biblio.ugent.be/publication/314490/file/452146.pdf

jacobolus · on Jan 2, 2022

> Both signed and unsigned arithmetic are using finite fields,

This is not correct. They are rings with some additional (not mathematically standard) operations thrown in.

'Finite field' means something very specific, and quite different. https://en.wikipedia.org/wiki/Finite_field

Computer integer arithmetic hardware can be used to implement finite field arithmetic, but even the 2^n case takes quite a bit of trickery. https://en.wikipedia.org/wiki/Finite_field_arithmetic

xxs · on Jan 2, 2022

>The name clearly indicates that the primary feature is that it is not signed, and the logical conclusion from that is that it is not negative.

Years ago, when the names were designated the CPUs didn't even have proper signed instructions. There was a sign flag, and that was all. In that regard the unsigned was the natural CPU sympathetic type.

int_19h · on Jan 2, 2022

And yet the default type for variables and function arguments in C was (before it got deprecated) signed int, not unsigned.

xxs · on Jan 2, 2022

signed/unsigned stuff predates C, though. My (lack of) experience with C came much later, when I had used assembly for 3 different CPUs already, so no recollection of the times.

int_19h · on Jan 2, 2022

If you mean the default being signed, it's there because B - the predecessor of C - only had a signed integer (machine word) type. But note that this is about the case when the variable has no type at all - e.g. "auto x" is a signed int. They still had the choice of "int x" being equivalent to either "signed int" or "unsigned int" by default, though - since there's no backwards compatibility issue there - and chose signed.

FWIW, signed seems to be the default for most other languages from that time period, and often the only option - e.g. Algol-60 didn't have unsigned integers at all. Pascal kinda sorta did, if you defined an integer subtype with 0 as the lower bound... but the upper would still be that of signed int, and it'd behave as such. And Algol-68 called its unsigned type BITS - a pretty strong hint that they didn't think of it as arithmetic.

I'm actually kinda curious now as to when the concept of unsigned integers first appeared in a high-level programming language first.

quelsolaar · on Jan 2, 2022

Interesting perspective. I dont have the same experience. To me a wraping 32 bit index on a 64bit machine is much more likely to seg fault if it reads 4 billion values forward, rather than a few bytes backwards and therefore i find unsigned index wrapps to be easier to debug. In all fairness wrapping indicies are an extremely rare bug for me, so i cant say i have much experience debugging it.

eterevsky · on Jan 2, 2022

I think when you use a variable that can have values in the range of millions, it's prudent to use 64-bit types and not 32-bit unless you are very much memory-constrained. And I've never seen a 64-bit signed variable overflow.

jcelerier · on Jan 2, 2022

> In all fairness wrapping indicies are an extremely rare bug for me, so i cant say i have much experience debugging it.

Do you run your code with -fsanitize=undefined -fsanitize=integer ?

quelsolaar · on Jan 2, 2022

No I do not. I run ISO conformant C without options.

jcelerier · on Jan 2, 2022

You really should try it on your codebase if you have the occasion

kazinator · on Jan 2, 2022

Unsigned means "modular arithmetic with a modulus that you do not choose and that is implementation-defined unless you have pinned down the bit width of the unsigned type, and that has implicit conversions from signed types that drastically alter the value".

Note that when mathematicians work in the area of congruences, they do not use unsigned integers.

For instance, when we look at Euler's Theorem:

https://en.wikipedia.org/wiki/Euler%27s_theorem

all the quantities in the formula can be understood as just integers, not unsigned integers.

The exponentiation of a can be understood as regular exponentiation. The modularity plays out in the triple-equal-sign operator and the (mod n) parenthetical part which says that the two sides of the equation are equivalent in a particular way.

The left side of the equation uses ordinary exponentiation and can produce values >= n; it is not wrapped.

It's crystal clear that we cannot replace the triple equal sign with the regular one, and drop the (mod n).

SAI_Peregrinus · on Jan 2, 2022

They also don't use signed integers, in the sense that computer languages use the term. Signed integers are also bounded in extent, and for most systems wrap on over/underflow. Euler's Theorem doesn't hold for them any more than it does for unsigned integers.

Don't treat computer "integers" like mathematical integers. It leads to pain. They're members of a few finite fields.

AnotherGoodName · on Jan 2, 2022

>Unsigned does not mean "not negative", it means "modular arithmetic".

Signed is modular arithmetic with an offset.

An unsigned char called i can be thought of as "i mod 256"

A signed char called i can be thought of as "((i+128) mod 256) - 128"

All results for over/underflow hold for 'i' cast into a char if you do the above in both cases.

I think the premise of the article is quite reasonable. It's slightly easier to reason about over/underflow for unsigned arithmetic since it's not offset.

jcelerier · on Jan 2, 2022

> Signed is modular arithmetic with an offset.

it is not, it models integers (Z). The model only works in the bounds of what the platform can offer of course ; when you use "int" you say "this is an integer. In any case I'm supposed to encounter, adding two positive integers will yield a greater positive integer ; if I add numbers so big that my platform cannot represent the sum my program is meaningless anyways".

It is very unlike unsigned which models (much more accurately since it's much easier for our computers with finite memory) Z/pZ.

AnotherGoodName · on Jan 2, 2022

Signed and unsigned are actually treated internally the same on computers.

The reason you see -2 printed out when you add signed chars -1 and -1 together is because adding 255 and 255 together gives 254 under mod 256. The signed model that allows negative numbers relies on the fact that all numbers on computers are modular arithmetic.

So in that sense any argument that unsigned is modular and signed isn't is incorrect. You're better off choosing the one that forces you to consider how the computer actually operates under the hood.

jcelerier · on Jan 2, 2022

> Signed and unsigned are actually treated internally the same on computers.

if you are writing code in C or C++ the computer you are programming for is the C / C++ abstract machine. C++ recently sanctified complement-of-two as integer representation, but C does not and supports machines with one's complement representation.

leni536 · on Jan 2, 2022

> C++ recently sanctified complement-of-two as integer representation

While that's true, signed integer overflow on arithmetic operations remains undefined behavior.

kryptiskt · on Jan 2, 2022

One's complement is on its way out in C too. The next C standard will require two's complement.

virtualritz · on Jan 2, 2022

> Unsigned does not mean "not negative", it means "modular arithmetic".

It does mean "not negative" in languages with bounds checks. Like Rust.

whattheactualf3 · on Jan 2, 2022

Is it called "unsigned" though?

In Racket (and in mathematics) we call those natural numbers. The very notion of "unsigned" was invented by video game programmers at Bell Labs. In mathematics positive numbers are still signed (they just have a positive sign).

masklinn · on Jan 2, 2022

> Is it called "unsigned" though?

Yes: https://doc.rust-lang.org/std/primitive.usize.html

> The pointer-sized unsigned integer type.

> In Racket (and in mathematics) we call those natural numbers.

Naturals have no upper bound. Most languages don't have naturals, because they do have upper bounds. Languages which do provide naturals (limited by the host machine) usually don't bother with signing segregation, and give you mathematical integers (again limited by the host machine's capabilities).

gumby · on Jan 2, 2022

> The very notion of "unsigned" was invented by video game programmers at Bell Labs.

PL/I had (has) unsigned integers back in the 60s, picking up the idea from earlier languages. I believe the novel datatype in PL/I was the CHARACTER type; into the 80s I was still programming machines with variable length bytes.

In many ways PL/I heralded the current Ordovician stage of programming languages, with the enormous Cambrian flourishing of hardware-specific (and company-specific) languages slowly dying out. Interestingly the only surviving languages older than it, FORTRAN and LISP, are also machine-independent.

ncmncm · on Jan 2, 2022

Unsigned types are supported in machine language.

Any language that offers access to features of the machine will support unsigned numeric types.

No language supports natural numbers. The best you can get is bignums. A language that calls its numbers natural is just lying. (Likewise, integers, and reals.)

"int" is not a lie, it is a hint.

cies · on Jan 2, 2022

Though, in idiomatic Rust I did not find the use of unsigned as mentioned in the article, signed for indexes is the norm.

tialaramex · on Jan 2, 2022

> in idiomatic Rust I did not find the use of unsigned as mentioned in the article, signed for indexes is the norm

Huh? If you write things[n] in Rust, and n isn't usize that's an error, you can't index things with the other primitive types at all out of the box.

berkut · on Jan 2, 2022

Wait, what?

Pretty much everything for indices (at least in the stdlib) seems to be 'usize' in my experience (i.e. Vec), which is unsigned.

Am I missing something?

CasperDern · on Jan 2, 2022

> Unsigned does not mean "not negative", it means "modular arithmetic".

Unsigned does mean non-negative (or rather no indication of negatives) as the sign of a number is just the positive or negative factor of a number: 1 or -1.

MaxBarraclough · on Jan 2, 2022

> almost all code that uses signed integers to represent values that will never be negative, tends to have a cacophony of range assertions and other tests which are just as error-prone as bounds checking to remembering to write, but also maintain when refactoring. It’s truly underappreciated how much those tests can be eliminated if your integer can never actually become negative due to the type system itself.

This is something the Ada language gets right: you define your integer type with a range, and the compiler automagically inserts the runtime range checks. (Unless you switch them off with compiler directives.) It makes far more sense to bake the range-checks into the type than to do it the manual way and just hope you don't miss anything.

You can do this in C++ using a library (thanks to templates) [0] but this is very rarely done.

[0] https://www.boost.org/doc/libs/1_78_0/libs/safe_numerics/doc...

pyjarrett · on Jan 2, 2022

Ada provides integers and floats with range checks, and you can specify the underlying size, but intermediate values don't get checked. It also has a separate form of unsigned integers called "modular types" which expressly indicate wrapping, if that's your intent.

    -- Range checked equivalent of uint64_t;
    type Nibble is range 0 .. 15 with Size => 64;

    -- Every assignment is as if Value := (Value % 16);
    type Wrapped_Nibble is mod 16;
    type Small_Float is digits 4 range 0.0 .. 20.0 with Size => 32;

I agree that the subexpression issue is problematic, IIRC I thought this was checked in SPARK code, but I could be wrong. The big benefit is describing your intent and you know that the stored value is in range, though whether it went outside of that range in calculation might not be known. Another thing is that these types define 'First, 'Last, 'Pred, 'Succ, and 'Range attributes which you can use.

    for N of Nibble'Range loop
        -- do something
    end loop;

MaxBarraclough · on Jan 2, 2022

> I thought this was checked in SPARK code, but I could be wrong

I believe that's correct, see my other comment. [0] Does still feel a bit ugly though.

[0] https://news.ycombinator.com/item?id=29768848

lifthrasiir · on Jan 2, 2022

Range types are not a panacea, especially combined with other arithmetic operators. There are essentially two options: [lo1, hi1) + [lo2, hi2) can result in a new type [lo1 + lo2, hi1 + hi2) and so on, or any operation reverts back to the base type. Only the former provides a static type guarantee but can be extremely annoying to use and you eventually have to constrain the type back with runtime checks. (Interval arithmetic also shares this problem.) Probably for this reason, to my knowledge, both Pascal and Ada implements the latter, which is more usable but also arguably useless: if your intermediate type is ranged you have to manually calculate correct bounds (a failure to do so will cause a runtime error), and if it's unranged there is zero guarantee even in the run time.

MaxBarraclough · on Jan 2, 2022

Somewhat related: there's apparently a defect in the Ada language spec which permits reordering of arithmetic operations such that out-of-range behaviour may hinge on the compiler's whim: [0]

> non-parenthesized arithmetic operations could be re-ordered by the compiler, which may result in a failing computation (due to overflow checking) becoming a successful one, and vice-versa. By default, GNATprove evaluates all expressions left-to-right, like GNAT.

Presumably one solution would be to use a three-address-code style [2] to completely tie the compiler's hands regarding ordering, but this seems painfully restrictive even by the standards of SPARK.

Also, an Ada compiler's internal choice of base type (with which to implement a range-based integer type) can impact overflow behaviour: [1]

> The choice of base types influences in which cases intermediate overflows may be raised during computation. The choice made in GNATprove is the strictest one among existing compilers, as far as we know, which ensures that GNATprove’s analysis detects a superset of the overflows that may occur at run time.

I don't know if there's a fully portable robust answer to this second problem. (I believe you can generally use hints to force the Ada compiler to use a particular sized type, along the lines of C's uint32_t, but that this isn't portable.)

edit On second thought, I imagine using a three-address-code style would solve this too. Either the result falls within the permitted range of the destination variable, or it doesn't.

[0] https://docs.adacore.com/spark2014-docs/html/ug/en/appendix/...

[1] https://docs.adacore.com/spark2014-docs/html/ug/en/appendix/...

[2] https://en.wikipedia.org/wiki/Three-address_code

cesarb · on Jan 2, 2022

For me, the main reason to prefer unsigned integers whenever possible is: less special cases.

When a signed integer is used as an array index, the value can be in one of three ranges: the <0 range, the >=0 && <size range, and the >=size range. To validate the index, you need two comparisons. When an unsigned integer is used as an array index, there are only two ranges: the <size range, and the >=size range, and you need only a single comparison.

And that's not the worst situation with signed integers. From a more general point of view, there are four "classes" of signed integer: positive (>0), zero, negative (<0), and INT_MIN. Everybody tends to forget about that last one, but it's "special" in that it can break things in unexpected ways. Negating it doesn't work (you'd expect negating any number less than zero to result in a number greater than zero, but for INT_MIN that doesn't happen). Dividing it can trap (see for instance https://kqueue.org/blog/2012/12/31/idiv-dos/ which has a couple of examples) or worse.

With unsigned integers, there are only two "classes", zero and non-zero, and it's common to not even need special treatment for zero, reducing the whole thing to a single "class" of values.

phkahler · on Jan 2, 2022

>> To validate the index, you need two comparisons.

If your code doesn't have bugs, no validation of indexes is required.

cesarb · on Jan 2, 2022

> If your code doesn't have bugs, no validation of indexes is required.

Your code doesn't have bugs because you validated your indexes. What you said makes some sense when the index comes from within the program itself, but not when it comes from outside the program. Consider for instance the case of reading a data structure from a file or the network, where one field is an index or an offset into another part of the structure; you must validate that this index or offset is not outside the bounds, no matter how perfect your code is.

kingcharles · on Jan 2, 2022

The number of programs ever written without bugs can be counted in a uint8.

virtualritz · on Jan 2, 2022

While wrapping parts of OpenSubdiv for Rust [1] I noticed the C++ API was littered with uses of int for indexing/lengths (both are always positive) of geometry buffers.

So I opened a ticket with Pixar [2].

Some of the same strange arguments were brought forward (incl. the Google style guide BS). I wrote a longish reply [3] that mentions two comments [4,5] from an ill-guided article promoting use of signed in C++.

I closed the ticket in the end because I felt that if people really believe there are good reasons for signed in the cases at hand they haven't grokked the problem sufficiently – trying to convince them otherwise would not be a good use of my time.

[1] https://crates.io/crates/opensubdiv-petite

[2] https://github.com/PixarAnimationStudios/OpenSubdiv/issues/1...

[3] https://github.com/PixarAnimationStudios/OpenSubdiv/issues/1...

[4] https://www.learncpp.com/cpp-tutorial/unsigned-integers-and-...

[5] https://www.learncpp.com/cpp-tutorial/unsigned-integers-and-...

jcelerier · on Jan 2, 2022

Quoting one of these comments:

> The argument with substracting 5 from 3 is not an argument against using unsigned. When you use signed you can run into a similar problem when the result underflows. This happens much less frequent and causes UB, at INT_MIN, which is the worst kind of bug. The bug is often not triggered durring testing and you will only notice it when it is too late.

The "When you use signed you can run into a similar problem" part is just not true in our world. 3 - 5 happens much more often than anything that may cause signed .*flow.

    "signed underflow" "bug report"

yields 8 results on google FFS

leni536 · on Jan 2, 2022

No such thing as "underflow" for integers. Signed integers overflow in both directions. Underflow is a floating point concept.

Edit: https://en.m.wikipedia.org/wiki/Arithmetic_underflow

cies · on Jan 2, 2022

So what about an unsigned int going below zero? I think most in the comments here mean that by "underflow", but i could be mistaken (and it should correctly be called "overflow" as well).

I agree with the WP article saying integers more correctly "wrap around" (than overflow).

leni536 · on Jan 2, 2022

The C standard only calls certain integer arithmetic to "overflow" if it results in undefined behavior. In that sense undefined integer arithmetic never overflows, but they "wrap around", as the behavior is defined there.

I can't actually find a normative definition for "overflow" in the C or C++ standards, but there are sections and notes that describe this interpretation:

C++: https://eel.is/c++draft/full#basic.fundamental-note-2

C: http://web.archive.org/web/20211231011139/https://cigix.me/c...

There can be subtle differences between various definitions of "overflow", and I imagine there could be other subtly different definitions outside C and C++. But "underflow" is exclusively a floating point concept, as far as I know.

von_lohengramm · on Jan 2, 2022

API contracts aside, signed indices do have a major benefit: https://godbolt.org/z/q4eTdPdcd

strictfp · on Jan 2, 2022

Thanks! If you don't mind explaining, what are we seeing here?

masklinn · on Jan 2, 2022

After the initial setup block,

    c1 = block[i1]; c2 = block[i2];
    if (c1 != c2) return (c1 > c2);
    i1++; i2++;

becomes

        mov     cl, byte ptr [rdx + rcx]
        cmp     byte ptr [rdx + rax], cl
        jne     .LBB0_6
        lea     eax, [rdi + 2]
        lea     ecx, [rsi + 2]

if the indices are unsigned, but

        mov     al, byte ptr [rcx + rdx + 1]
        cmp     byte ptr [rdi + rdx + 1], al
        jne     .LBB1_6

if the indices are signed, that is the compiler just removes the indices and traverses the block directly (starting at the input offsets).

That's because in C (and C++) overflow is UB, so the compiler can assume it doesn't happen. Clang is apparently unable to make such a determination or work around it for unsigned, so for every increment of the indices it actually goes and computes the indices to fetch the items from the array.

Incidentally, GCC does not care and generates the exact same (rather different) code for both signednesses.

graphitemaster · on Jan 2, 2022

In this case the problem is more specifically the use of uint32_t. I never quite mentioned in the article that if you're going to use unsigned integers for this sort of thing to always use the native-word-size unsigned type, e.g size_t, if you actually change that to uint64_t (or size_t) the code generation is better as you can see here: https://godbolt.org/z/11nEz6EPT

sparkie · on Jan 2, 2022

I see no reason why the same addressing mode could not be used in both versions. This just seems like an optimization that has been made under the assumption that in the addressing mode

    [reg1 + reg2 + disp]

reg2 refers to a signed integer (Which may result in the effective address being < reg1 if reg2 is negative).

Obviously you would not want to make the same assumption if reg2 refers to an unsigned integer, because if the most significant bit is set, it would result in an effective address below reg1, which is definitely not what we would expect from adding an unsigned integer.

But in this case, 64-bit addressing is being used, and the value of reg2 is a 32-bit integer which has been specifically zero-extended. We can make the assumption that [reg1 + reg2 + disp] can never result in an effective address below reg1 (assuming positive disp).

If you take the signed version, and replace the lines

    movsxd  rdi, edi
    movsxd  rcx, esi

with

    movzx  rdi, edi
    movzx  rcx, esi

I believe you will have something functionally equivalent to the unsigned version.

Of course, this same optimization could not apply if we were using uint64, but it could in the case of int64.

Either way, buggy code with no bounds checking is not a reliable method of determining what optimizations can be done in production.

tom_ · on Jan 2, 2022

The code is compiled as it is in order to handle the case where the indexes need to wrap around. Suppose i1=0xffffffff on entry - you then want to access block[0x00000000ffffffff], block[0x0000000000000000], block[0x00000000000000001]. I've written out all 64 bits of the offset, in the hope this makes the problem clearer: there's no addressing mode for getting the 32 bit truncation. That's why there's an LEAs after each access. (Note that they write to the 32 bit part of each register.)

In the int32_t case, suppose i1=-1. Then you want to access block[0xffffffffffffffff], block[0x0000000000000000], block[0x0000000000000001], and so on. You can do an initial sign extension of the indexes, and then the base64+index64+disp32 addressing mode gives you the right result.

Make the unsigned version take uint64_t indices, and you get the same code for both. Or #include <stddef.h>, and make it take size_t - same thing. This is exactly the sort of thing that size_t is there for.

graphitemaster · on Jan 2, 2022

Author here. I believe many of the complaints here are concerned about what is more common rather than what is universally better over all possible inputs which is actually the point of view this article is written from. The issue with the "general case" is that exploits are never actually found there, they're always found in the edge cases such as large, or "pathological" values as it were. Signed integer arithmetic has more of these edge cases than unsigned when it comes to sizes, indices, and offsets used in expressions controlling memory (either directly, or indirectly) which is the most common application of integers in a codebase. The native-word-size unsigned integer type covers the full numeric range for those operations, while signed simply cannot. At the same time, preventing a whole numeric-range that is simply incompatible with those operations (negative values), and a whole class of issues related to undefined operations. The only real edge-case unsigned has that is more error-prone is values close to zero under subtraction and that's relatively easy to account for which I go into great detail to explain.

mpweiher · on Jan 4, 2022

Which is why the only sensible option is a proper numeric tower.

Then the common cases are fast and the edge cases slightly slower, but everything works the same.

munch117 · on Jan 2, 2022

The reason I've always preferred unsigned is that it's easier to write code that is correct for the full input domain.

Consider for example:

  extern void g(int); 
  void f(int x, int y)
  {
      g(x - y);
  }

The function f obviously doesn't work for all values of x and y: Some values will overflow, some values will underflow.

How do you fix that? Writing an explicit check for whether x-y overflows is highly impractical, no one does that.

Or you could document the limitation using pre- and postconditions that narrow down the range of allowed values to something you know will work:

  extern void g(int); 
  void f(int x, int y)
  /*
      PRECONDITION: -1000000 <= x <= 1000000 and -1000000 <= y <= 1000000
  */
  {
      g(x - y);
  }

No one does that either. I don't know why. There's no principled reason why you couldn't meticulously keep track of signed int input and output ranges. But that's how it is, no one does it, valid int ranges are always unstated.

Now for the unsigned version of the same problem:

  extern void g1(unsigned); 
  extern void g2(unsigned); 
  void f(unsigned x, unsigned y)
  {
      if(x >= y)
          g1(x - y);
      else
          g2(y - x); 
  }

Assuming g1 and g2 are well-defined for all unsigned inputs, so is f. That was easy.

It may look like more code, because there are two g functions now. But chances are you probably need to treat the x<y case differently somewhere anyway; in the larger scheme of things, unsigned is not more code.

clappski · on Jan 2, 2022

What you should be doing for checked arithmetic with GCC is use the builtins for those that aren’t aware;

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

munch117 · on Jan 2, 2022

It would be nice to see something like that standardised so you could use it in portable code.

It doesn't solve the problem though. You still need to write some code for what to do if there's an overflow, and you need separate handling for underflow. So the signed case is now:

  extern void g1(int); 
  extern void g2(int); 
  extern void g3(int); 
  void f(int x, int y)
  {
      int x_minus_y;
      if(__builtin_ssubl_overflow(x, y, &x_minus_y))
         g1(x_minus_y);
      else if(x > y)
         g2(...); /* overflow */
      else
         g3(...); /* underflow */
  }

Work in progress. I've given up on figuring out what arguments to pass to g2 and g3. Since the difference won't fit in an int, you would need to offset-adjust the value somehow in order to fit it in an int. Seems messy. Maybe you can think of something simpler.

Slartie · on Jan 2, 2022

Tangentially related:

https://docs.microsoft.com/en-us/answers/questions/680467/ex...

Had Microsoft used an unsigned int to store the date - which can't ever be negative in this notation - Exchange wouldn't have stopped working worldwide on 1/1/2022. They would have been safe for another two decades, in which they would hopefully had either gotten rid of that weird code or switched it over to 64 bits.

sparkie · on Jan 2, 2022

Had Microsoft used common sense, they would not store the string representation of a date as an integer, period.

jeroenhd · on Jan 2, 2022

It depends on the use case. If this is an index that's commonly sorted by and enumerated, the performance impact of a string compare versus a simple 64 bit integer compare would be enormous. A single compare instruction versus a series of unicode character comparison operations with subsequent jump instructions would not only be more code, it would also be more likely to wreck the branch predictor of the CPU with all the added conditionals.

I think using 64 bit integers for their natural hardware-accelerated sortability is quite a neat trick. You just have to be sure to use actual 64 bit numbers, and you have to think ahead to check if your data type is wide enough to actually contain your data.

masklinn · on Jan 2, 2022

> They would have been safe for another two decades, in which they would hopefully had either gotten rid of that weird code or switched it over to 64 bits.

Why? The code would have kept working all along, there would have been no reason to explore it, and lots of other issues to chase.

bee_rider · on Jan 2, 2022

This ought to be the canonical example for this sort of thing -- in the sense the unsigned value only* 'helps' if you are already doing something terminally silly.

*to some definition of 'only' that excludes the cases like cryptography, hardware, etc stuff.

kazinator · on Jan 2, 2022

This is an misguided and dangerous article.

Suppose that a, b and c are small values close to zero and so can be combined additively/subtractively in any combination without overflow. If they are signed, we can rely on algebra, like:

  a - b > c       // this expression
  a - b - c > 0   // can become this expression
  a - c > b       // or this

This breaks for unsigned; you cannot just change the sign of c and move it to the other side of the inequality.

Even when you're not actually doing this algebra in the code, it's useful to be able to do it outside of the code. Sometimes in the code too.

For instance, suppose now that the values are not trivial. We have no assurance that a - b will not overflow, but we do have assurance from somewhere that a - c will not. The rewrite a - c > b is then helpful in avoiding overflow, and easily provable to be equivalent to the a - b > c expression that naturally comes from the problem specification.

pornel · on Jan 2, 2022

> It’s very trivial to check if x * y would overflow though and you should just learn the extremely simple and obvious test.

    if (y && x > (T)-1/y)

I think it's sarcasm, but with C guides like this it's never certain :)

ufo · on Jan 2, 2022

Maybe this is not a problem for calculating the size of a memory allocation, but that division operation is going to be several times slower than the multiplication it is guarding. Is there a faster way to check for overflow, perhaps using compiler builtins?

saberience · on Jan 2, 2022

Maybe I'm just a moron but what makes this test "obvious"? What is T?

afiori · on Jan 2, 2022

i imagine it is the type of x and y

beeforpork · on Jan 2, 2022

I agree that signed is generally inferior and less often what you really need. For me, in C and C++, the main arguments for 'unsigned by default' are:

(1) integer overflow/underflow is undefined behaviour. You don't want that, it's evil. Unsigned does not have that problem (often it is still a bug if you over- or underflow -- but at least you get sane code from the compiler).

(2) size_t is unsigned and is returned by sizeof() and is the argument to memcpy(), strcpy(), etc., and with -Wconversion, you do not want to handle all the warnings when mixing this with signed loop counters or size variables.

Unfortunately, one problem with using unsigned by default in C is that ptr1-ptr2 yields ptrdiff_t, which is signed. Which is bad, in my opinion, because I usually want to find an array size or index when I subtract two pointers, so I want size_t, and in these cases, I always know which pointer is larger.

Oh well, it's C.

Slartie · on Jan 2, 2022

The one thing I really, really hate Java for is its lack of unsigned integers (there are a lot of other minor inconveniences, like in any language, but nothing comes close to the missing unsigned types).

This of course has to do with me frequently writing and maintaining low-level hardware and network protocol code, talking to devices on the other end that had firmware written in C and therefore expect you to be fluent in bit manipulation. But hey, all that nice software is useless if it can't eventually interface with the real world out there, so hardware will stay relevant and necessary and the accompanying close-to-the-metal code can't be optimized out of the system, hence I consider good tools for doing bit-level work an important quality in any general-purpose programming language. And besides bit-level operators that means support for unsigned integers which are needed to easily combine bit-level with classic arithmetic operations.

You can actually live without either of these in practice and still do low-level stuff, but it is very unintuitive, has poor runtime performance and leads to hard-to-read and bug-prone code.

matsemann · on Jan 2, 2022

While Java doesn't have unsigned types, it at least with java8 got functions to operate on ints as if they were unsigned.

But Kotlin (also running on the jvm) got unsigned types with built in operator overloading for them, making it easy to work with. "val a = 100u" gives you an unsigned Int. If you use Java, there are few reasons not to use Kotlin. https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/-u-int/

hashmash · on Jan 2, 2022

> has poor runtime performance

Are you referring to performance issues caused by lack of unsigned types? Do you have a specific example? Based on my observations, the conversions necessary for acting on signed types as if they were unsigned generates the correct machine instructions designed for unsigned types.

CJefferson · on Jan 2, 2022

I really donnt buy it is a "trivial idiom" to check unsigned undeflowing by writing 'i > size'. I'd have to check that was right all the time.

Also in general saying all we have to do is (I'm paraphrasing a bit) "not make mistakes" is, to me, a bad idea, as all programmers make mistakes.

I sometimes assert after a calculation that a value is >= 0, it's much harder to check that if it's underflowed because it was unsigned.

heisenbit · on Jan 2, 2022

> for (size_t i = size - 1; i < size; i--) {

Am I alone here feeling that this while working is anything but intuitive and is asking for trouble down the road? Just imagine having to change the countdown for whatever reason in maintenance to 1 instead of zero and you suddenly fiddle with the operator.

throwaway744678 · on Jan 2, 2022

You are not alone! This is completely unclear: the next guy reading your code will definitely "fix" the halting condition. I'd either handle i == 0 outside of the loop, or use something like:

    for (size_t j = size; j > 0; --j) {
        size_t i = j - 1;
        // etc.

Or even:

    for (size_t j = 0; j <= size; ++j) {
        size_t i = size - j;
        // etc.

quelsolaar · on Jan 2, 2022

I agree with this almost entierly and i use almost exclsivly unsigned ints in C.

However, its not true that unsigned is faster because the compiler can optimize. Consider:

x = (x * 2) / 2;

If x is unsigned, and overflows, x will not be the same after this operation so the compiler can not optimize away this line. If x is signed and overflow is undefined, the compiler can assume it wont overflow and optimize away the line. UB affords the compiler a lot of optimizations.

Const-me · on Jan 2, 2022

> its not true that unsigned is faster because the compiler can optimize.

It’s not false either. For instance, casting int32_t into int64_t requires an instruction like movsxd which does sign extend. Casting uint32_t into uint64_t can often be merged into the instruction who computed the uint32_t value, because the instruction which write 4 bytes registers like eax zero out old data in the higher 4 bytes of the corresponding 8 bytes rax register.

quelsolaar · on Jan 2, 2022

Thanks! I didn't know that!

MaxBarraclough · on Jan 2, 2022

The article acknowledges this:

> There are some optimizations compilers can make assuming signed integers cannot underflow or overflow that unsigned does not get to participate in.