Should be prefaced with, "I think". Having done user research on this by speakin...

skohan · on Feb 11, 2019

The more time I spend with strongly typed languages the more I am convinced it is the right way to go. For modern languages with good type inference, and good tools for protocols/interfaces not tied to an inheritance hierarchy, it is a at worst minor inconvenience for a huge benefit.

tanilama · on Feb 11, 2019

> I can say that static typing is desired by a nonzero number of who practice what we would consider to be data science and machine learning

Who would trade static typing with fast prototyping any time.

Data science is a really nebulous term covering many drastically different domains of CS. Many DS I talked with, don't really produce code, they do coding to produce analysis, which is the actual delivery. For them, code is ad-hoc and disposable, created on demand and left in the dust until rediscovered when mission comes.

Some of the code do survive and enter production stage, I guess that is where they would seek some assurance from static typing. But I do think they could learn to mitigate most of pain if they can commit themselves to write some unit-tests/functional tests, yet such awareness is rare among the DSs I know and worked with.

So all in all, yes static typing MIGHT help, in some way, but I don't think it addresses the underlying pain point as much.

phillipcarter · on Feb 11, 2019

> Who would trade static typing with fast prototyping any time.

These need not be at odds. Many ML languges like F# or OCAML, by use of type inference, get you type safety without having to type a bunch of stuff and sacrifice faster prototyping. And certainly in F# there is a history of having productive tooling that lets you prototype easily. Simply writing some F# code in an F# script in an IDE, hitting alt+Enter, and letting it execute in an interactive shell is hugely productive for exploratory tasks. And features like Type Providers build out types for an arbitrary data set that let you guarantee your code is actually correct for the data.

What I've mentioned isn't without its flaws, and eventually someone is going to reach head-scratching problems just as they would in any other environment. I don't think there's an objective way to measure productivity across a wide range of professionals, but I do believe that some subset of them would prefer static types for their work. This is backed by conversations with some of them about problems they encounter.

pjmlp · on Feb 11, 2019

Although I am a big fan of a couple of dynamic languages, when it scales we really need static types to make any sense of it, even to our older selfs a couple of months down the line.

So gradual typing like in Julia is already a good thing for having the best of both worlds.

dnautics · on Feb 11, 2019

Correctness verification at the level that data scientists need can generally be achieved with optional typing (presuming a well designed type system)

phillipcarter · on Feb 11, 2019

Perhaps! I personally think it's still a very young field, and there's likely a spectrum of professionals who prefer some strong degree of typechecking.

This is being explored with "Live Checking" in F#[0], which offers a form of static typing over TensorFlow without actually forcing you to express every complex interaction with data in types.

[0]: https://github.com/fsprojects/TensorFlow.FSharp#live-checkin...

FridgeSeal · on Feb 11, 2019

> achieved with optional typing (presuming a well designed type system)

Enter stage left: Julia

Julia is already pretty great, I'd really love to see what cool stuff we could have with a swell in community size and investment!

dnautics · on Feb 11, 2019

Yeah that's kind of what I'm referring to but the default array typing in flux.ml doesn't encode tensor dimensionality in the type system. If it did (which it very easily could in julia) you wouldn't wind up with a situation where your learning task halts in the middle of a training run, which can happen in flux.ml

ninjin · on Feb 11, 2019

Due to the way that code composition works in Julia, there is no real “default” array for Flux. Rather, you can lift in any array type that you like. The GPU arrays are an excellent example of this, Flux “knows” nearly nothing about GPUs (apart from a few convenience functions), yet works perfectly when using a GPU array type. So there is nothing stopping you from lifting in say StaticArrays [1] which carries the sizes in the type or NamedArrays [2] where dimensions have explicit names – the latter being superior in practice to the former in my opinion, or perhaps someone is up for marrying the two?

[1]: https://github.com/JuliaArrays/StaticArrays.jl

[2]: https://github.com/davidavdav/NamedArrays.jl

In brief, it is not the duty of the automatic differentiation package to favour a specific array type – it just works for all of them, which is something that I find fairly magical with Julia.

dnautics · on Feb 11, 2019

1) It is not the duty of AD to favor an array type, but flux is an ML library. When you do something like Chain() or Dense() or LSTM() in flux, which is very obviously an ML tensor operation, it SHOULD pick reasonable, fixed (or variable!) tensor dimension. This is maybe not so easy, but it should be doable. Likewise, I wish Flux had "batch" and "minibatch" types that had specifiable dimensions so that if you try to hook up to data to layers of the wrong shape it gives an early warning.

2) StaticArrays would be a good starting point, but the point of it is to optimize Arrays by unrolling for loops and triggering SIMD (IIRC) and there are performance penalties when your arrays get really large, which they do, in ML. Something LIKE the staticarrays typesystem but without the overoptimization would be welcome.

3) (kind of tangential) I have beef with how GPU is handled as GPUArray in julia. It really should be handled as a worker node using the ClusterManagers-type semantic; you should be async sending tasks to the GPU as if it were a remote agent (which it kind of is, due to PCI bus bandwidth and latency bottlenecks) and waiting for the result to come back as a Future.

byt143 · on Feb 11, 2019

Regarding 3, can you make an issue or discourse post for discussion?

pjmlp · on Feb 11, 2019

In that regard Julia is hardly any different to TypeScript.