Passing Pointers

fzzzy · on July 23, 2012

Yes! I really hope more people catch on to why this is an excellent idea.

Urls as pointers are not needed in a world where Facebook controls everything and has access to everything by id in it's backend.

A world where services provide and consume urls is a world where it doesn't matter what server something is on, everyone can participate.

icebraining · on July 23, 2012

It's always refreshing when people building web services actually understand the web. (Note: not sarcasm, unfortunately this is rare).

ChuckMcM · on July 24, 2012

Except that if you understand the web you understand that a URL is worthless. Worse than worthless its down right dangerous. The moment you write a URL is the moment the clock starts ticking on the data that it refers to being the same data as the URL describes. This can fail in spectacular ways. A loooong time ago there was a web site that hosted comments on a golf courses that referenced to a hosted community site. The front side of the application would put a snippet of the comment and a link to the full comment. The service passed away into the web zombie land (the web site was still serving pages, the links still pointed at the comment site, and nobody was home). The comment site got sold or acquired and someone put up malware on every single inlink. Blammo armed and dangerous.

The concept the OP is going for is 'deferred work' which is to say not to pass around data that isn't going to be used. And that is indeed a noble goal, but you must have a way to vet that the pointer you passed still points to the thing you thought it did, or you will find out what so many C programmers have discovered about caching pointers, bad bad bad idea.

brettcvz · on July 24, 2012

The central issue here is trust. If you trust the provider of the url to maintain the link, ensure that it stays alive and pointed to what you expect, then it's fine.

In the C caching pointers case, the issue is that the system makes no guarantees about the "live"-ness of any prior pointers, whereas on the web this is entirely possible and encouraged (See oft-cited post about "Cool URIs don't change)

doktrin · on July 24, 2012

The central issue here is trust. If you trust the provider of the url to maintain the link, ensure that it stays alive and pointed to what you expect, then it's fine.

How can that ever be a realistic expectation over anything but the immediate and short term? Not to mention that in the overwhelming majority of cases the "provider" of a URL is not the party responsible for maintaining it.

I have to agree with the parent - unless the recipient of the URL [continuously] validates the source, she is simply asking for trouble down the road.

BHSPitMonkey · on July 24, 2012

I don't think I trust any web site in that regard, though. I could even see Google yanking the floor out from under me as a consumer of one of their URL schema.

icebraining · on July 24, 2012

If you didn't trust any web site, you wouldn't click on any link.

Trusting a website doesn't mean you have to trust them indefinitely. You can trust that the URL will be kept alive for a certain length of time - minutes, hours, days, etc - and deal with them accordingly.

icebraining · on July 24, 2012

The fact that you can't treat an URL as static doesn't make it "worthless", that's ridiculous.

If you want long term storage, you can always GET the content and store it locally. But if you only get a copy of the data, you can't do everything else (PUT back updated data, poll for changes, etc).

A loooong time ago there was a web site that hosted comments (...) The comment site got sold or acquired and someone put up malware on every single inlink. Blammo armed and dangerous.

And for all you know you can get malware from a data copy as well. Using URLs is not a reason to disregard basic security practices, like verifying the content that you receive from the other service.

The concept the OP is going for is 'deferred work' which is to say not to pass around data that isn't going to be used. And that is indeed a noble goal, but you must have a way to vet that the pointer you passed still points to the thing you thought it did

Which you have: the documentation says the URLs are valid for 4 hours.

Mythbusters · on July 24, 2012

I think the point being made here is only valid when you just move the data around and don't alter it. filepicker is a perfect example where you pass the data around without actually touching it and such a usage is a perfect case for passing the "pointers" around. In most other cases, you'd want to pull it down the moment you are going to transform it.

icebraining · on July 24, 2012

Even if you're going to pull it down, passing an URL around is still better. It means the client software can schedule the download for a few minutes or even hours to reduce peak bandwidth demands, they can PUT an updated version back, they can poll for changes, etc.

throwaway54-762 · on July 23, 2012

Aw, and I expected a C/C++ article! Still, this is good stuff: "zero copy" for web content.

brettcvz · on July 23, 2012

Ha good point - Same concept, way different level of abstraction

liyanchang · on July 23, 2012

"my data stores and yours are practically collocated".

Truth. Every time I ping from my server and get ridiculously low response times, I have to pause to think before thinking "Thank you AWS".

brettcvz · on July 23, 2012

Until everywhere has fiber to the curb, the internet backbone is going to be orders of magnitude faster than the last-mile speeds

ori_b · on July 23, 2012

At which point, the demand for bandwidth will go up, and the internet backbone will have to grow, and it will still be orders of magnitude greater than last mile speeds. Internet backbone will always be faster than last mile speeds by necessity.

brettcvz · on July 23, 2012

Fair point. It will be interesting to see in what areas the demand for bandwidth will go up. At some point doubling resolution is no longer noticeable.

ukd1 · on July 23, 2012

Games with massive textures (retina?), streaming lossless music, 1080p / 4k &&|| 3D 'netflix'?

Bandwidth will always get used up. "64k is enough"...LOL.

kstenerud · on July 24, 2012

This will only work so long as services copy the contents of the URL you give. This opens up a whole slew of security and permission issues. Otherwise the original link becomes the weak link in a potentially long chain of links.

A pointer is handy and convenient until the resource it points to disappears.

brettcvz · on July 24, 2012

Can you elaborate on the security/permission issues? It's something we spend a lot of time thinking about, would love to learn more about your concerns and how we can address them.

I think there are two use cases for URLs as pointers, transient and persistent. In the transient case, you use the URL as a temporary representation, pass it around to the relevant hops, and then fetch it wherever the final resting place may be. In the persistent case (say you're building a web app where users upload content and you want a permanent link), there needs to be somewhere in the line that a service "guarantees" that the link will remain alive and true so that the resource it points to sticks around. So in our case, what we do is if the developer asks for a persistent link, we will take a snapshot of the content and persist it encrypted on our CDN for them, making sure the link stays alive

kstenerud · on July 24, 2012

Depending on the URL, you may need to be logged in to see the content. So in such a case, the service that responds to the URL provided would need to support granting permission to a 3rd party (in the case of the 3rd party copying the data and hosting it), or potentially granting permission to everyone (if the 3rd party is only going to host the URL, not a copy of the data).

As well, you'd need to deal with abuse issues. If the 3rd party is going to host a copy, you need to make sure it can't be used as a DOS attack against the URL's original host, for example.

The web is chaotic by nature. That is its strength, but it's also a weakness when it comes to data longevity. Without some kind of standard for guaranteeing data persistence that reaches critical mass (such as over 80% of content hosters implementing it), this will be a very tough nut to crack.

Another problem I see is legal. DMCA takedowns are so easy to do that a dead URL is only a lawyer letter away, even if their claim is completely invalid. Now suddenly a single takedown will affect potentially hundreds of sites instead of just one (in the case of URL pointers rather than data copies).

If a content site is DDOSed, suddenly the damage is magnified due to all of the other sites depending on a single point of failure.

joshma · on July 24, 2012

I'd also point out that URLs aren't exactly pointers as they don't (all) support writes in addition to reads. It'd be interesting to see a webservice support locking, reading, and writing from URLs as pointers.

joshma · on July 24, 2012

Nice, didn't notice that! (For those interested, it's actually documented here: https://developers.filepicker.io/docs/web/#fpurl-contents)

A 501 sounds like the closest error code, and I'd say having both asynchronous and synchronous modes of locking might be useful. Synchronous just holds the connection open (certain frameworks don't mind long-lived connections) while an asynchronous method might pass in a callback_url in the request to be hit when the file is ready, in the case of lockage.

(NB: to be honest I'm not too sold on the demand for locking vs ovewriting, I guess I threw it in the list of [things that files can do]. Might be interesting to see this need evolve as files move to the "cloud" though.)

EDIT: While I'm at it, a PUT method for creating files could be cool too, to let people use filepicker without the JS widget.

// oops, missed the link. meant to reply to sibling comment

icebraining · on July 24, 2012

I'd also point out that URLs aren't exactly pointers as they don't (all) support writes in addition to reads.

    return &"Read only data";

brettcvz · on July 24, 2012

We do read/write, re. lock would you want to prevent all reads? Do a 503 or just hang the connection until the lock is undone?

stuffihavemade · on July 23, 2012

I'm running into this problem right now with S3. I have a bunch of files on a cdn that I want to store in my bucket but (as far I know), I have to download them all to my machine before storing them. I'd love for the api to accept a URL.

brettcvz · on July 23, 2012

If you can convince S3 to integrate with Filepicker.io we'd put a statue of you in our office.

bjornsing · on July 23, 2012

So true. For example, I love gmail but can't believe how many times a day I download an attachment just to upload it as an attachment to another email. It's ridiculous.

ukd1 · on July 23, 2012

This is cool, but I wonder when it will actually be done by reference rather than reference then copy.

When will I be able to keep a "this is in use here" record for a file that is stored else where. Whilst way better than download-upload on broadband / cell; it still seems dumb to copy in the first place, even if it is on internet backbone.

clord · on July 23, 2012

This would require some sort of single-sign-on, or capabilities system. Would love to see capabilities-based security for web services, actually. "Here is a token that grants permission to service X to perform action Y for duration Z." Can OpenID and ilk do this?

Anyway, apps that don't need security should be as you describe already.

Kevin_Marks · on July 23, 2012

This is pretty much exactly what OAuth does - enables the user to authorize a web service to access another one on their behalf, with per-app, per-user constraints.

brettcvz · on July 23, 2012

I mean you can do that right now with URLs no? When you need the content, do a GET request on the link, when you want to save do a POST. This is especially nice for images as you can just throw the url we pass back in an img tag

icebraining · on July 23, 2012

Nitpick: you'd save with a PUT. Otherwise, yes, that's exactly what one should do.

brettcvz · on July 23, 2012

Yes - my bad. To be fair, we support both, but PUT is the proper verb

Scene_Cast2 · on July 24, 2012

Yay, garbage collection and memory management! How do you know whether a URL has expired or not? I'm assuming temp URLs, which is quite reasonable in a lot of cases.

brettcvz · on July 24, 2012

HEAD requests?