Sebastian Majstorovic @storytracer
Digital Historian & Data Consultant | https://t.co/fev0QjCWjp | https://t.co/yqa5eIfpTu | Co-Founder @sucho_org storytracer.org Cologne Joined November 2007-
Tweets802
-
Followers2K
-
Following813
-
Likes2K
Contributing datasets will be a top way to support machine learning in 2024. @Dorialexander and @ana_stasenko are providing some of the best examples of this kind of work via @pleiasfr, as demonstrated by how regularly their datasets trend on the Hub.
I made pdftext, a small tool that extracts text like pymupdf, but with an Apache license (mupdf is AGPL). It can pull out blocks and lines or plain text. Find it here - github.com/VikParuchuri/p… .
Announcing that we are on our way to solve a long standing issue of document processing: correction of OCR mistakes. @pleaisfr publishes the largest dataset to date with automated OCR correction, 1 billion words in English, French, German and Italian huggingface.co/datasets/PleIA…
Announcing surya reading order! It predicts the order that a human would read a document in. It's useful for RAG, accessibility, and text extraction. It works on a variety of documents, layouts, and languages.
Big announcement: @pleiasfr releases a massive open corpus of 2 million Youtube videos in Creative Commons (CC-By) on @huggingface. Youtube-Commons features 30 billion words of audio transcriptions in multiple languages, and soon other modalities huggingface.co/datasets/PleIA…
It's not about being GPU poor or rich; it's about being data poor or rich.
Very proud to be involved with @IWMDocU helping colleagues document war and prouder still to announce our expansion and move to Lviv as INDEX. @Index_Ukraine iwm.at/news/announcin…
PDF text recognition directly in the browser. I‘ve been looking for a tool like this for years to recommend to historians and other people working with scanned documents on a daily basis. Amazing work as always by @simonw!
PDF text recognition directly in the browser. I‘ve been looking for a tool like this for years to recommend to historians and other people working with scanned documents on a daily basis. Amazing work as always by @simonw!
Incredible work by @Dorialexander and team at @pleiasfr! I've said for a couple years that there's more data in heaven and Earth than are dreamt of in your philosophy. But data work isn't "fun" so it doesn't happen. There are also 2T+ good permissive English tokens online today
Incredible work by @Dorialexander and team at @pleiasfr! I've said for a couple years that there's more data in heaven and Earth than are dreamt of in your philosophy. But data work isn't "fun" so it doesn't happen. There are also 2T+ good permissive English tokens online today
@_DINUM @storytracer Common Corpus include the largest English-speaking open dataset to date: 180b words including the gigantic newspaper corpus of Chronicle America that can be now explored thanks to @nomic_ai atlas.nomic.ai/data/aaron/pdn…
Announcing today in @WIRED the release of Common Corpus, the largest collection of fully open corpus on HuggingFace: nearly 500b words (600-700b tokens) in public domain. wired.com/story/proof-yo…
If you’re into open LLMs (*really* open), stay tuned for a big news on Wednesday :)
We are thrilled to announce the stable version of Haystack 2.0 🎉 We’ve been working on this for a while and now Haystack 2.0 has everything to help you implement composable LLM applications that are easy to use, customize, extend, optimize, evaluate, and deploy to production.…
Just published! A new @ProgHist lesson by @ak_blankenship, Quinn Dombrowski and Sarah Connell doi.org/10.46430/phen0… Thanks to @RubenRos8 and @a_heyer for their reviews, and to @lievesofgrass for editing.
Three weeks left to send us an abstract to participate in the ERC CAPASIA conference 'Commodities and Environments in Early Modern Global Asia, 1400–1800', 13-15 November 2024 @EUI_History . Bursaries available.
My new side project colorama.app lets you explore original color photos from the early 1900s on a world map. The first batch consists of over 60,000 Autochromes commissioned by Albert Kahn and digitized by @museealbertkahn . More collections coming soon!
Just a reminder that we are accepting paper proposals for our conference on 13-15 November 2024 @EUI_History 'Commodities and Environments in Early Modern Global Asia, 1400–1800'. Send us an abstract by 20 February! capasia.eu/comenv24/
Amazing reception of YugoGPT yesterday we already had ~1000 sign-ups! :) and overall really positive feedback! Also the media coverage was pretty cool I had: * Radio Television of Serbia (@RTS_Vesti) - main national frequency broadcaster in Serbia covering YugoGPT work on their…
On 1/1/2024, Mickey Mouse will enter the US public domain! What can you do with him? Does Disney still hold copyrights over later iterations of Mickey? Does trademark law play a role? The answer, ironically, is distinctly mouse-shaped. Learn more web.law.duke.edu/cspd/mickey/
Tom Ashby @tomaashby
8K Followers 3K Following Historian • Co-org @gpolthought • Editorial Board @Global_IH • Postdoc Fellow @EURONEWSproject • PhD @EUI_EU • prev @EinaudiOnlus @qmHPT @KebleOxford • autisticOlga Byrska @OByrska
3K Followers 5K Following PhD researcher in post-WW2 intellectual history at the @EUI_EU. Teaching at @sciencespo Reims campus. Writing for theatre, soon for movie. 🌈 🇵🇱 she/herSteven Seegel 🇺�.. @steven_seegel
24K Followers 7K Following @UTCREEES professor • 🏳️🌈 🇪🇺 🇺🇸 • founder, The February 24th Archive • @AAUP • podcasts @NewBooksEEuro • Buffalonian • @ModEuroHistColl @H__UkraineHistory at EUI @EUI_History
6K Followers 679 Following Department of History @EUI_EU. We offer PhDs and Post Doctoral Fellowships focused on transnational and comparative European historyTamara Scheer @ScheerTamara
4K Followers 4K Following Ottakringerin & Historian, Adjunct Professor University of Vienna & Project Head Pont. Institute Santa Maria dell'Anima RomeJohn Paul Newman @johnpaul_newman
9K Followers 7K Following Historian of C19th/C20th Central and Southeastern Europe. Hanging around Dublin, Prague, Sofia, Zagreb.Rachel G Trode @racheltrode
223 Followers 227 Following PhD Researcher (sher/her) @EUI_History working on late Habsburg rule in Bosnia-Herzegovina. Retweets ≠ endorsement.Rok Stergar @stergarr
2K Followers 850 Following Historian of the Habsburg Empire in the long 19th century; also WWI and history of nationalism. Associate Professor @FFLjubljana.Ulrike Tanzer @UlrikeTanzer2
2K Followers 3K Following Forschungsinstitut Brenner-Archiv - Austrian Studies - Digital HumanitiesEdin Hajdarpašić @_edinh
7K Followers 2K Following Historian; montažer; raw onion theorist; professor. Takes notes & writes on nationalism, empire, Balkans, conversion, etc. — https://t.co/EzlphUWIuiHelmut Smith @SmithHelmut
10K Followers 9K Following Prof. History. Germany, Holocaust, maps.Bks: Germany. A Nation in its Time (2020); The Butcher's Tale (2002), a few others. https://t.co/Uw5qmZelGDDominique Kirchner Re.. @DominiqueReill
4K Followers 691 Following Historian of Modern Europe, Italy, Croatia, Habsburg Empire, Mediterranean, LaGuardia, Nationalism, Migration. Author of https://t.co/BwwCRpMJoe + https://t.co/Q6GhEf1WcR she/herFlorian Wenninger @F_Wenninger
8K Followers 2K Following Leiter Institut für Historische Sozialforschung, Senior Research Fellow @univienna, forscht zu österreichischer Zeit- und Polizeigeschichte, hier privat.EUI Intellectual Hist.. @inthisEUI
3K Followers 1K Following The Intellectual History Working Group at @EUI_History @EUI_EU. Convenors @ArturBanaszews1 @LaurelinMDr Anne Luther @AnneLuthera
2K Followers 2K Following Founder and Director of the Institute for Digital Heritage and Principal Investigator, Digital Benin.Visual and Material H.. @MaterialEUI
1K Followers 1K Following Researcher-led WG @EUI_History on material and visual sources in historical research. Run by @isabelleriepe @elisachazal @amberburbidge @F_Montuori94Alex Drace-Francis @AlexDrace
3K Followers 1K Following European history @uva_Amsterdam. Find me at https://t.co/EWsmptbRTZ…D. Anca Cretu @dacretu
2K Followers 3K Following Historian. Work on foreign aid, migration, Central & Eastern Europe. Sports (yes, I am a tennis and F1 nerd) & corny pop culture consumer. Content accordingly.katharina prager @kathi_prager
5K Followers 2K Following historian. research/participation @wienbibliothek. interested in lifewriting, vienna1900, exile, gender. writes auto/biographically=own viewsMaria Levchenko @taiga75
187 Followers 173 Following Digital Humanities Software Engineer | Bologna UniversityTheroosl @theroosl95770
1 Followers 158 FollowingMonique Cynthia @MoniqCynt
21 Followers 5K FollowingJulieta Sherfey @JulietaShe94439
87 Followers 5K FollowingSiena Amsterdam @AmsterdamS43274
73 Followers 5K FollowingClyde Bambeck @ClydeBambe57736
50 Followers 5K FollowingMin Dorshorst @dorshors_m
51 Followers 5K FollowingINDEX: Institute for .. @Index_Ukraine
146 Followers 117 Following Cultural and research institution documenting Russia's war and facilitating exchange between Ukrainian and international intellectuals. Founded by @IWM_Vienna.ShirelyCrumbliss @ShirelyCru8661
23 Followers 828 FollowingEmaan Vanderlinde @EmaanVande58648
39 Followers 3K Following 🌐Emaan ~ 25 ~ Earn your own Crypto casino👇⚡simpletrading @simpletrad17722
426 Followers 7K FollowingAgnus Sonderman @AgnusSonde2697
60 Followers 5K FollowingLiza Doughtry @DoughtryLi16224
42 Followers 5K FollowingVikram Pattabiraman @mgrvik
53 Followers 810 FollowingJoannie Iba @JoannieI49921
64 Followers 5K FollowingMia @aMia_1990
810 Followers 3K Following Love photography, love nature, share my every day and present the charm of nature!📸Nicolette Wittlin @wittlin60344
37 Followers 5K FollowingMagaret Schuchman @MagaretSch81482
18 Followers 2K Following Magaret | 24 | Earn your own Crypt$ casino👇💰Patricia Murrieta @patymurrieta
2K Followers 4K Following Co-director of #DigitalHumanities @LancsDigHum. See my work: @UnlockArchives @DiggingCH. #history #AI #spatialhumanities #GIS Also: mum of YubYub Commander 🐻🐾Alexandra Halliday @HalliAlexand
38 Followers 5K FollowingAlina Vanwoert @AVanwoer
39 Followers 5K FollowingBirdie Holsten @HolstBirdi
74 Followers 5K FollowingPamelia Gilly @GillPameli
48 Followers 5K FollowingOnie Legree @OLegree32662
43 Followers 5K FollowingHailey Schoelkopf @haileysch__
3K Followers 816 Following she/her | research scientist @aiEleuther | LLM training/infra, eval, data | LM Evaluation Harness maintainerCaragh Prach @cara_prac
28 Followers 5K FollowingLaci Jadlowiec @JadlowiL
72 Followers 5K FollowingAbderrazak O. @AbderZac
87 Followers 859 Following Chief Data Officer @tbwacorporate - Chargé de cours @iscom - @sciencespo alumniTom Ashby @tomaashby
8K Followers 3K Following Historian • Co-org @gpolthought • Editorial Board @Global_IH • Postdoc Fellow @EURONEWSproject • PhD @EUI_EU • prev @EinaudiOnlus @qmHPT @KebleOxford • autisticOlga Byrska @OByrska
3K Followers 5K Following PhD researcher in post-WW2 intellectual history at the @EUI_EU. Teaching at @sciencespo Reims campus. Writing for theatre, soon for movie. 🌈 🇵🇱 she/herHistory at EUI @EUI_History
6K Followers 679 Following Department of History @EUI_EU. We offer PhDs and Post Doctoral Fellowships focused on transnational and comparative European historyTamara Scheer @ScheerTamara
4K Followers 4K Following Ottakringerin & Historian, Adjunct Professor University of Vienna & Project Head Pont. Institute Santa Maria dell'Anima RomeEuropeana @Europeanaeu
44K Followers 2K Following Europe’s cultural heritage online https://t.co/AR27Bs946w Resources for cultural professionals https://t.co/jNWbuMtHLS Funded by the European UnionTranskribus @Transkribus
11K Followers 4K Following The AI-powered platform to unlock history. AI text & layout recognition, transcription, searching & publishing of historical documents.EUI Researchers' Unio.. @EUIResUnion
999 Followers 63 Following Grassroots association that represents PhD researchers at the European University InstituteRok Stergar @stergarr
2K Followers 850 Following Historian of the Habsburg Empire in the long 19th century; also WWI and history of nationalism. Associate Professor @FFLjubljana.Edin Hajdarpašić @_edinh
7K Followers 2K Following Historian; montažer; raw onion theorist; professor. Takes notes & writes on nationalism, empire, Balkans, conversion, etc. — https://t.co/EzlphUWIuiHelmut Smith @SmithHelmut
10K Followers 9K Following Prof. History. Germany, Holocaust, maps.Bks: Germany. A Nation in its Time (2020); The Butcher's Tale (2002), a few others. https://t.co/Uw5qmZelGDDominique Kirchner Re.. @DominiqueReill
4K Followers 691 Following Historian of Modern Europe, Italy, Croatia, Habsburg Empire, Mediterranean, LaGuardia, Nationalism, Migration. Author of https://t.co/BwwCRpMJoe + https://t.co/Q6GhEf1WcR she/herManuel Burghardt @8urghardt
2K Followers 670 Following Head of the Computational Humanities Group at Leipzig University, https://t.co/geT4lhCpEyMarko Demantowsky @MDemantowsky
1K Followers 394 Following professor of public history @pubhistvienna and then there this and thatRoopika Risam, PhD @roopikarisam
20K Followers 9K Following Associate Prof. @Dartmouth Digital Humanities & Social Engagement, formerly @SalemState, edits #ReviewsInDH, Higher Ed Editor @PublicBooks, @DEFConsortium PIEUI Intellectual Hist.. @inthisEUI
3K Followers 1K Following The Intellectual History Working Group at @EUI_History @EUI_EU. Convenors @ArturBanaszews1 @LaurelinMthomas cauvin @thomascauvin
4K Followers 924 Following Associate Professor of #Publichistory at @C2DH, International Federation for PH (@pubhisint). Author of "Public history: A Textbook of Practice" Views my own.Visual and Material H.. @MaterialEUI
1K Followers 1K Following Researcher-led WG @EUI_History on material and visual sources in historical research. Run by @isabelleriepe @elisachazal @amberburbidge @F_Montuori94Alex Drace-Francis @AlexDrace
3K Followers 1K Following European history @uva_Amsterdam. Find me at https://t.co/EWsmptbRTZ…SUCHO (Saving Ukraini.. @sucho_org
772 Followers 40 Following Grassroots initiative supporting the digital preservation of Ukrainian cultural heritage. Tweets by @storytracer, @quinnanya & @anna_kijas.ParaCrawl @ParaCrawl
246 Followers 7 FollowingINDEX: Institute for .. @Index_Ukraine
146 Followers 117 Following Cultural and research institution documenting Russia's war and facilitating exchange between Ukrainian and international intellectuals. Founded by @IWM_Vienna.Tadhg Fleming @tadhgfleming_
12K Followers 14 Following 📍Kerry | Ireland 🇮🇪 • Sharing a bitta Craic 💚 • Content Creator📲 • Messer with a phone 🤣 • Links to other stuff ⤵️ https://t.co/qTpMwv5P7vMotherDuck @motherduck
5K Followers 121 Following Making analytics fun, frictionless and ducking awesome with a serverless easy-to-use data analytics platform based on @DuckDB in collab with @duckdblabs.James Wright @jms_wright
2K Followers 3K Following Programme specialist at @UNESCO working on AI ethics and governance. Anthropology/STS research on care and tech. Author of "Robots Won't Save Japan". Own viewsHPLT @hplt_eu
219 Followers 15 Following Horizon Europe - High Performance Language Technology (HPLT)Europeana Copyright @EuropeanaIPR
2K Followers 472 Following We cultivate, curate & share knowledge around the topic of copyright in the cultural heritage sector #copyright #PublicDomain #openGLAMColin Raffel @colinraffel
30K Followers 655 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpclem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersHailey Schoelkopf @haileysch__
3K Followers 816 Following she/her | research scientist @aiEleuther | LLM training/infra, eval, data | LM Evaluation Harness maintainerAviya Skowron @aviskowron
336 Followers 481 Following they/them. Head of Policy and Ethics @AiEleuther. Find me in the EleutherAI Discord to chat. Always looking for ways to weave philosophy into my job.Vlad Vexler @VladVexler
17K Followers 841 Following Philosopher - ethics, politics, music | Slowly writing a book on Isaiah Berlin | Born in Russia, home is London | Living with ME since 2003. | 🔗 🎥Robin Rombach @robrombach
6K Followers 398 Following Generative enthusiast and long-term PhD Student @LMU_Muenchen. Author of VQGAN, Latent Diffusion, Stable Diffusion.Kate Knibbs 🏄🏻�.. @Knibbs
15K Followers 2K Following senior writer at Wired 🦐 Story tips: [email protected] (or DM me for Signal/WhatsApp!) extremeknibbs on Insta/Threads and @knibbs.bsky.socialGiulia Priora @giuliapriora
2K Followers 967 Following Director @nova_ipsi | Assistant Professor @NOVAunl School of Law | intellectual property, IP & sustainability, copyright law, distributive justiceMaarten Zeinstra @mzeinstra
584 Followers 486 Following Philosopher of Technology and Intellectual Property Lawyer. Owner at https://t.co/XDhYiDDL47communia @communia_eu
3K Followers 843 Following The Communia Association works to strengthen the public domain.Jamie Folsom @jamiefo.. @jamiefolsom
1K Followers 2K Following Partner, Performant Software Solutions @[email protected] https://t.co/eBPwQXRaXeAlbert Villanova @avillanovamoral
2K Followers 5K Following ML Engineer @huggingface. Data Scientist, PhD Theoretical Particle Physics, BSc Computer Science. Always learning. he/himStella Biderman @BlancheMinerva
15K Followers 748 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herAllen Institute for A.. @allen_ai
54K Followers 361 Following AI for the Common Good. › Join us: https://t.co/DqTs1G4bGO › Get our newsletter: https://t.co/tvb1VpySfLLuca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (Dolma 🍇), OSS is fun, @QueerInAI organizer 🤖☕️🍕they/them (views mine, not my employer’s)brian foo @beefoo
1K Followers 127 Following senior innovation specialist @librarycongress. previously at @amnh, @nypl_labs. views my ownLC Labs @LC_Labs
12K Followers 12 Following Official account for news about https://t.co/lctN8kOfYR & digital strategy & collections of the @LibraryCongress. All Library accounts: https://t.co/Fie2HgyZlETorsten Hiltmann @TorstenHiltmann
2K Followers 1K Following Digital History, Humboldt Universität zu Berlin. Und offiziell jetzt auch hier: @[email protected]Cultural AI Lab @cultural_ai
1K Followers 569 Following Researching AI for human culture. By @CWInl @KNAWHuC @VUAmsterdam @UvA_Amsterdam @benglabs @KBNLresearch @rijksmuseum & Nationaal Museum van WereldculturenN3XTCODER @N3XTCODER
690 Followers 1K Following N3XTCODER is more than code. We help to develop meaningful digital products in new ways.Steven Claeyssens @sclaeyssens
1K Followers 964 Following curator of digital collections @KB_Nederland | https://t.co/oIIvPa4umy | editor @NbvBoekhist | https://t.co/NrJt13q63R | @DHBenelux | members council @europeanaeu | B in NLDaring Fireball @daringfireball
95K Followers 1 Following Entries from Daring Fireball. Feel free to comment with replies to posts.Marcus Bitzl @MarcusBitzl
156 Followers 695 Following Artist. Food Lover. Hacker. Crafter. Open Access. Fervent believer in democracy and the equality of all people. (he/him) https://t.co/Wjxzn22tI9DHI Paris @dhiparis@w.. @dhiparis
7K Followers 6K Following Forschen – Vermitteln – Qualifizieren: Für das Deutsche Historische Institut Paris twittern Mareike König, Leonard Dorn, Corentin Marion und Theresa Finger.Protomaps @protomaps
2K Followers 71 Following the free and open source map foundry | tweets by @bdonMaxime Durand @TriFreako
2K Followers 663 Following World-Design Director @Ubisoft. Prev. director of Discovery Tour & historian for @AssassinsCreed. Not using X much. he/himBob Whitaker @WhitakerAlmanac
1K Followers 1K Following Professor of History @collincollege. Empire, International Crime/Policing, & video games. Creator of @historyrespawn. Not communicating on behalf of employer.Archaeogaming @Archaeogaming
3K Followers 308 Following ⛏🕹 Dedicated to exploring archaeology both of and in games. The book is out now! https://t.co/rmcqL19gQR #archaeogaming.Holly Nielsen @nielsen_holly
8K Followers 908 Following Historian of play | AHRC funded PhD | Writer & narrative designer- Neurocracy, Devolver Tumble Time, TBA things | Former journalist & critic | she/herAya Bochman @ayabo66
260 Followers 195 Following Co-Founder @fashn_ai building a virtual try-on platform.Geovistory @Geovistory
54 Followers 6 Following A Virtual Research Environment and Data Publication Platform for the Humanities and Social Sciences we tweet as @KleioLab, @DH_unibe and @arhn_larhraCarl Benedikt Frey @carlbfrey
5K Followers 500 Following Prof @oiioxford Director, Future of Work @oxmartinschool at Oxford University. Author of The Technology Trap (@PrincetonUPress, 2019)Steve Jobs Archive @SJArchive
6K Followers 0 Following The Steve Jobs Archive is the authoritative home for Steve’s story and a resource for new generations eager to make their own mark.paper trail media @paper_trail_m
6K Followers 228 Following Investigative newsroom | partner of @derspiegel @zdf @derStandardat @tamedia | collaborating with @occrp @fbdnstories @icijorg @acdatacollectiv @examinationnewsChristo Buschek @christo_buschek
880 Followers 187 Following Data, computations and investigations | #pulitzer prize 2021 | @derspiegel | @paper_trail_m | @NYUEngelberg | https://t.co/oAON5Z60Pq | [email protected]AI4Culture @AI4Culture
53 Followers 13 Following Online hub for the application of AI technologies in the Cultural Heritage sector - a project co-funded by @EU_HaDEAtaod @the_art_of_data
86 Followers 211 Following We are dataful minds ## Data & Analytics aus Köln-Ehrenfeld ## Agiles Mindset für strategische Beratung, Umsetzung und Analytics Trainings ## NOW HIRING!!!I've shipped most of the models + libraries I wanted in the last few months: - PDF to markdown - marker - Text line detection, OCR in 93 languages, layout analysis, reading order - surya - Equation to LaTeX - PDF text extraction Find them on Github - github.com/VikParuchuri/.
I am delighted to be starting as a Postdoctoral Research Fellow tomorrow @EURONEWSproject with @BrendanDooley & @stefanovil - I will work (more) on Amerigo Salvetti (1572-1657) & Giovanni Salvetti Antelminelli (1636-1716), Florentine residents in London, focussing on c.1640-1660
I've met so many people in SF who have no idea the @internetarchive HQ is in the city and that they offer free tours on Fridays. Took the @internet_pipes crew there last week and it was one of the coolest tours I've ever done. I learned so much: • The non-profit has archived…
Finished my first full year of teaching! ✨✨✨ this is such a hard and rewarding job.
Contributing datasets will be a top way to support machine learning in 2024. @Dorialexander and @ana_stasenko are providing some of the best examples of this kind of work via @pleiasfr, as demonstrated by how regularly their datasets trend on the Hub.
I’m thinking to build conversational LLMs in this line, but "rather than" is insane. At the absolute best it would be a new way to interact with the text.
Soon, students will have CONVERSATIONS with Aristotle, Socrates, or Plato rather than just reading about them. This is how AI is REVOLUTIONIZING education.
Inspired by @simonw, I packaged whisper.cpp on PyPI, with pre-built binaries for macOS and Linux. So you can install and run Whisper with pip or uv -- nothing else required. Get Whisper running in ~200ms with uv.
So, GPU-poor Sunday. Nothing available on my usual places. Guess I will have to walk or something.
I made pdftext, a small tool that extracts text like pymupdf, but with an Apache license (mupdf is AGPL). It can pull out blocks and lines or plain text. Find it here - github.com/VikParuchuri/p… .
Announcing that we are on our way to solve a long standing issue of document processing: correction of OCR mistakes. @pleaisfr publishes the largest dataset to date with automated OCR correction, 1 billion words in English, French, German and Italian huggingface.co/datasets/PleIA…
The model and its design can freely be reused for similar use cases of RAG and is thought in itself as a contribution to the extended open LLM research ecosystem.
Our conviction at Pleias is that current LLMs cannot exist in isolation. They have to be integrated into wider knowledge infrastructure, leveraging the existing resources but also the wider stack of knowledge technologies.
@simonw MacWhisper can run files in bulk. Just select multiple files as input.
On Earth Day in Stockholm, I received the 2024 Vega-Medal from King Carl Gustav XVI of Sweden for my work on Ukraine and the history of human geography, earth sciences, and critical cartography. With @tessmegginson whose work on #maps you should follow! I'm grateful and humbled.
Congratulations @steven_seegel!! What a wonderful and richly deserved honor! #Cartography #geography #maps
On Earth Day in Stockholm, I received the 2024 Vega-Medal from King Carl Gustav XVI of Sweden for my work on Ukraine and the history of human geography, earth sciences, and critical cartography. With @tessmegginson whose work on #maps you should follow! I'm grateful and humbled.
Announcing surya reading order! It predicts the order that a human would read a document in. It's useful for RAG, accessibility, and text extraction. It works on a variety of documents, layouts, and languages.
Superb 3D GIS project mapping the fortified city of Livorno in Italy as it was in 1646 and combining this with a 1646 census of the population so that you can click on buildings and bring up a linked database entry on the occupants - clever stuff from @DECIMAUoT 👍🏻
We have an exciting announcement here at DECIMA, the launch of Livorno3D - the newest stage in our digital mapping work! experience.arcgis.com/experience/b7b… 1/6
@Tagishsimon @ostrisai Yes you're right. @storytracer also recommended it to me but haven't have the time to use it yet.