-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wayback machine image URLs still loading images from original Amazon S3 URL #1379
Comments
Hi @jywarren, I am happy to check this out. |
Hi @jywarren, I checked the code. The transformation that takes place in the function (in archive.js) below is responsible for the behaviour you are talking about. If my memory serves me right, I think we designed it this way at the time because of issues related to accessing the images programmatically via IA. I also observed the wayback machine itself simply loads the images from s3. What do you think?
|
Hmm, did this apply only to JSON maybe? Would you mind trying removing that
so that it loads directly from the wayback machine?
Thanks for finding that!!!
…On Sun, Mar 12, 2023, 2:48 PM Segun ***@***.***> wrote:
Hi @jywarren <https://github.com/jywarren>, I checked the code. The
transformation that takes place in the function (in archive.js) below is
responsible for the behaviour you are talking about. If my memory serves me
right, I think we designed it this way at the time because of issues
related to accessing the images programmatically via IA. I also observed
the wayback machine itself simply loads the images from s3. What do you
think?
// where imageSrc is in format: https://web.archive.org/web/20220803171120/https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg
// returns https://s3.amazonaws.com/grassrootsmapping/warpables/48659/t82n_r09w_01-02_1985.jpg or
// returns same url unchanged (no transformation required)
function extractImageSource(imageSrc) {
if (imageSrc.startsWith('https://web.archive.org/web/')) {
return imageSrc.substring(imageSrc.lastIndexOf('https'), imageSrc.length);
}
return imageSrc;
}
*Illustration 1:*
[image: img]
<https://user-images.githubusercontent.com/1612359/224565688-4ebdb4cc-6b7b-4ba1-919b-18e1fa965c06.PNG>
—
Reply to this email directly, view it on GitHub
<#1379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAF6J3CHQMYKTAMZ5DZ7HTW3YK6VANCNFSM6AAAAAAVQP3O4Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Okay @jywarren, I'll look into this. Many thanks! |
Ah yes. I see - we get this error if we don't do that --
I'm not sure... is there another way to access https://web.archive.org/web/20200506081918id_/http://s3.amazonaws.com/grassrootsmapping/warpables/417/img_0135.jpg without CORS issues? Otherwise, we could... upload that entire directory into an Archive collection, and serve it from there. That is, wayback URLs have CORS limitations, but images in regular |
Yes, I pointed out the fact of CORS limitation in my previous message. It was the reason I fetched from s3 directly. Okay, but is there something wrong with fetching from s3 given that the legacy json files all have the image sources pointing to s3 either directly or indirectly ? For instance, https://web.archive.org/web/20200506081918id_/http://s3.amazonaws.com/grassrootsmapping/warpables/417/img_0135.jpg simply points to s3 indirectly nothing more. |
Yes, sorry, just agreeing and confirming from my test. Thank you! The only issue with s3 is that it costs Public Lab money to host -- it's not forever storage. I think perhaps the best choice is to create an archive.org collection and add to this logic in I'm working on uploading all the files, but it'll be a while. We can check in here again once it's complete! |
Ha! okay, I understand now. So archive.org option is definitely the route to take. I will check back then. |
gosh it's going to take a while! it's 631,813 files, i'm only at downloading 3875... I may try another way at a remote server that's faster... we'll see! |
Yeah... this has to take a while |
is this issue being worked on? |
Hi, we are still working on uploading the archive.org collection, apologies! |
I found a strange issue when I pointed at a collection of JSON files which have had images routed to the Internet Archive's Wayback Machine caches.
As you can see, the image links are routed to Wayback URLs: https://ia601603.us.archive.org/20/items/mapknitter-wayback/ceres--2.json :
i.e.: https://web.archive.org/web/0id_/https://s3.amazonaws.com/grassrootsmapping/warpables/305268/PuglisiTerrazzeHaghiaTriadaCretaAntica2007-28.jpg
However, when I actually load a page like this, somehow it still loads images directly from Amazon s3, not the Internet Archive:
https://publiclab.github.io/Leaflet.DistortableImage/examples/archive?json=https://archive.org/download/mapknitter-wayback/ceres--2.json
I inspected in the console and still can't figure it out.
@segun-codes @7malikk I was curious, if you had an interest in this, what do you think is happening here? Could any application logic we've written be causing this?
See for example the images at https://publiclab.github.io/Leaflet.DistortableImage/examples/archive?json=https://archive.org/download/mapknitter-wayback/ceres--2.json
still loads https://s3.amazonaws.com/grassrootsmapping/warpables/306187/DJI_1207.JPG
The text was updated successfully, but these errors were encountered: