• json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    More fun publisher surveillance: Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff between metadata from two of the same paper. Combined with access timestamps, they can uniquely identify the source of any shared PDFs.

    json_dirs tweet picture

    73 3K 7K 0 2K
    Download Image
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    You can see for yourself using exiftool. To remove all of the top-level metadata, you can use exiftool and qpdf: exiftool -all:all= <path.pdf> -o <output1.pdf> qpdf --linearize <output1.pdf> <output2.pdf> To remove *all* metadata, you can use dangerzone or mat2

    11 121 819 0 144
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    Also present in the metadata are NISO tags for document status indicating the "final published version" (VoR), and limits on what domains it should be present on. Elsevier scans for PDFs with this metadata, so good idea to strip it any time you're sharing a copy.

    json_dirs tweet picture

    2 84 518 0 21
    Download Image
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    Links: exiftool: exiftool.org qpdf: qpdf.sourceforge.io dangerzone (GUI, render PDF as images, then re-OCR everything): dangerzone.rocks mat2 (render PDF as images, don't OCR): 0xacab.org/jvoisin/mat2

    2 77 548 0 156
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    here's a shell script that recursively removes metadata from pdfs in a provided (or current) directory as described above. For mac/*nix-like computers, and you need to have qpdf and exiftool installed: gist.github.com/sneakers-the-r…

    json_dirs tweet picture

    7 92 601 0 103
    Download Image
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    The metadata appears to be preserved on papers from sci-hub. since it works by using harvested academic credentials to download papers, this would allow publishers to identify which accounts need to be closed/secured

    json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    The metadata appears to be preserved on papers from sci-hub. since it works by using harvested academic credentials to download papers, this would allow publishers to identify which accounts need to be closed/secured

    5 23 334 0 7

    6 55 387 0 10
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    for any security researchers out there, here are a few more "hashes" that a few have noted do not appear to be random and might be decodable. exiftool apparently squashed the whitespace so there is a bit more structure to them than in the OP: gist.github.com/sneakers-the-r…

    1 28 234 0 16
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    1 4 111 0 1
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    2 2 102 0 0
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    this is the way to get the correct tags: (on mac i needed to install gnu grep with homebrew `brew install grep` and then use `ggrep` ) will follow up with dataset tomorrow.

    horsemankukka Profile Picture

    Kukka de Bierguirb Häst @horsemankukka

    4 years ago

    this is the way to get the correct tags: (on mac i needed to install gnu grep with homebrew `brew install grep` and then use `ggrep` ) will follow up with dataset tomorrow.

    2 1 7 0 1

    2 3 77 0 3
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    of course there's smarter watermarking, the metadata is notable because you could scan billions of pdfs fast. this comment on HN got me thinking about this PDF /OpenAction I couldn't make sense of earlier, on open, access metadata, so something with sizes and layout...

    json_dirs tweet picture
    json_dirs tweet picture
    keyboard_arrow_left Previous keyboard_arrow_right Next

    3 2 106 0 2
    Download Image
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    updated the above gist with correctly extracted tags, and included python code to extract your own, feel free to add them in the comments. since we don't know what they contain yet not adding other metadata. definitely patterned, not a hash, but idk yet.

    json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    updated the above gist with correctly extracted tags, and included python code to extract your own, feel free to add them in the comments. since we don't know what they contain yet not adding other metadata. definitely patterned, not a hash, but idk yet.

    json_dirs tweet picture

    2 1 7 0 0
    Download Image

    4 2 79 0 1
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    you go to school to study "the brain" and then the next thing you know you're learning how to debug surveillance in PDF rendering to understand how publishers have so contorted the practice of science for profit. how can there be "normal science" when this is normal?

    6 108 592 0 17
  • json_dirs Profile Picture

    jonny saunders @json_dirs

    4 years ago

    follow-up: there does not appear to be any further watermarking: taking two files with different identifying tags, stripping metadata, and relinearizing with qpdf's --deterministic-id flag yields PDFs identical with a diff, ie. no differentiating watermark (but plz check my work)

    1 1 58 0 3
  • Wikisteff Profile Picture

    Stef @[email protected] Christensen @Wikisteff

    4 years ago

    @json_dirs I didn't expect hackivism to be on the curriculum, but here we are.

    0 0 5 0 0
  • WEDF_forum Profile Picture

    World Ethical Data Forum @WEDF_forum

    4 years ago

    @json_dirs This is fantastic work @json_dirs! Excellent!

    0 0 2 0 0
  • Algoriitmo Profile Picture

    Muhammad al-Khwarizmi 🇵🇸 @Algoriitmo

    4 years ago

    @json_dirs Have you guys tried ROT13ing the strings before decoding? That NN would turn into an AA, which would indicate a 00 byte somewhere

    1 0 1 0 0
  • Download Image
    • Privacy
    • Term and Conditions
    • About
    • Contact Us
    • TwStalker is not affiliated with X™. All Rights Reserved. 2024 www.instalker.org

    twitter web viewer x profile viewer bayigram.com instagram takipçi satın al instagram takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al sosyalgram takipçi satın al instagram ücretsiz takipçi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al metin2 metin2 wiki metin2 ep metin2 dragon coins metin2 forum metin2 board popigram instagram takipçi satın al takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al buyfans buy instagram followers buy instagram likes buy instagram views buy tiktok followers buy tiktok likes buy tiktok views buy twitter followers buy telegram members Buy Youtube Subscribers Buy Youtube Views Buy Youtube Likes forstalk postegro web postegro x profile viewer