Planetperson Posted December 21, 2018 Share Posted December 21, 2018 (edited) Hey everyone, I'm hoping that you can help me solve a little mystery. If you don't remember, I run a site called the "Official Greg Discussion Archive" (https://greg.thegreatarchives.com/), where I reconstructed the contents of the long-lost Official Greg Discussion topics from a series of text files saved by fishers64. Those text files used to be available for download here, although it looks like they are no longer available. I can provide them if necessary. My problem is this: in the second OGD topic (the "Official Greg Dialogue"), the text file only contains enough posts for 333 pages, but the original topic definitely had 334. The number 334 is corroborated by the last entry of the Farshtey Feed, as well as snapshots from web.archive.org as late as 2013. I'm trying very hard to determine whether there is missing content in the text file, the formatting of the text file is inconsistent, the code I use to process the file is incorrect, or something else. Most importantly, I'm trying to determine if there is something I can do to fix it. I have tried to figure out the underlying cause of this discrepancy from a number of different angles, but I haven't been able to come up with an explanation. Here is what I know that it isn't:Posts being deleted over time. If you look at web.archive.org snapshots of the first OGD topic at different points in time, you can see that a number of spammy posts were deleted throughout the years (by admins? I don't know.). However, as I mentioned before, the second topic still had 334 pages as of 2013, which is around the time the text file was created, so that does not explain it.The loss of all data between late 2008 and early 2009. Again, the fact that the topic had 334 pages in 2013 disproves this.Spurious "NEW POST" markers in the text file. The text file marks new posts with the text "NEW POST". There are some spurious "NEW POST" markers that have no content after them, and my code discards those. However, there are only 6 of these, which is not enough to reach 334 pages, so this does not explain it.Here are some examples of discrepancies between my archive and the snapshots on web.archive.org. The format is:<archive post #> = <original post #> (<difference between the two>) (<url> | <url>).120 = 120 (0; perfect match) (https://greg.thegreatarchives.com/2008-2010/page3#post120 | http://web.archive.org/web/20080421073524/http://www.bzpower.com:80/forum/index.php?showtopic=275890&st=80&start=80)1641 = 1642 (-1) (https://greg.thegreatarchives.com/2008-2010/page42#post1641 | http://web.archive.org/web/20080506125838/http://www.bzpower.com:80/forum/index.php?showtopic=275890&st=1640)1797 = 1800 (-3; direct evidence of 2 posts being deleted) (https://greg.thegreatarchives.com/2008-2010/page45#post1797 | https://web.archive.org/web/20080508084433/http://www.bzpower.com:80/forum/index.php?showtopic=275890&st=1760&p=5298879&)2566 = 2601 (-35) (https://greg.thegreatarchives.com/2008-2010/page65#post2566 | http://web.archive.org/web/20080421073528/http://www.bzpower.com/forum/index.php?showtopic=275890&pid=5372847&st=2600&&do=findComment&comment=5372847)3982 = 4022 (-40) (https://greg.thegreatarchives.com/2008-2010/page100#post3982 | http://web.archive.org/web/20080628174902/http://www.bzpower.com/forum/index.php?showtopic=275890&pid=5573259&st=4000&&do=findComment&comment=5573259)4302 = 4345 (-43) (https://greg.thegreatarchives.com/2008-2010/page108#post4302 | http://web.archive.org/web/20080714140349/http://www.bzpower.com/forum/index.php?showtopic=275890&pid=5628367&st=4320&&do=findComment&comment=5628367)13123 = 13183 (-60) (https://greg.thegreatarchives.com/2008-2010/page329#post13123 | http://web.archive.org/web/20100811083430/http://www.bzpower.com/forum/index.php?showtopic=275890&pid=7021553&st=13160&&do=findComment&comment=7021553)Any ideas? I would be happy to give anyone who helps me figure this out credits on the site. BTW, I have a number of enhancements for the site planned. Stay tuned. Here is a working list of snapshots on web.archive.org. If you come across any new ones, please let me know.page 1 / post 1-40 (Apr 2008 - Jan 2013)page 1 / post 1-40 (Apr 2008)page 1 / post 1-40 (Apr 2008 - Aug 2010)page 1 / post 1-100 (Apr 2008)page 1 / post 1-100 (Apr 2008)page 1 / post 1-100 (Feb 2010)page 1 / post 1-100 (Feb 2010)page 2 / post 41-80 (Apr 2008)page 3 / post 81-120 (Apr 2008)page 42 / post 1641-1680 (May 2008)page 43 / post 1681-1720 (Apr 2008)page 44 / post 1721-1760 (Apr 2008)page 44 / post 1721-1760 (May 2008)page 44 / post 1721-1760 (Jun 2008)page 45 / post 1761-1800 (Apr 2008 - Jun 2008)page 46 / post 1801-1840 (Apr 2008)page 47 / post 1841-1880 (May 2008)page 50 / post 1961-2000 (May 2008)page 51 / post 2001-2040 (Apr 2008)page 52 / post 2041-2080 (Apr 2008 - Jun 2008)page 53 / post 2081-2120 (Apr 2008)page 54 / post 2121-2160 (Apr 2008)page 55 / post 2161-2200 (Apr 2008 - Jun 2008)page 55 / post 2161-2200 (Apr 2008)page 56 / post 2201-2240 (Apr 2008 - Jun 2008)page 57 / post 2241-2280 (Apr 2008 - Jun 2008)page 58 / post 2281-2320 (Apr 2008)page 59 / post 2321-2360 (May 2008)page 60 / post 2361-2400 (Apr 2008)page 62 / post 2441-2500 (Apr 2008 - Jun 2008)page 63 / post 2481-2520 (Apr 2008 - Jun 2008)page 64 / post 2521-2580 (Apr 2008 - Jun 2008)page 65 / post 2561-2600 (Apr 2008)page 66 / post 2601 (Apr 2008) (new post)page 66 / post 2601-2640 (Apr 2008)page 67 / post 2641-2700 (May 2008)page 68 / post 2681-2720 (Apr 2008 - Jun 2008)page 69 / post 2721-2760 (Apr 2008 - Jun 2008)page 70 / post 2761-2800 (Apr 2008 - Jun 2008)page 71 / post 2801-2840 (Apr 2008)page 75 / post 2961-3000 (May 2008)page 76 / post 3001-3040 (May 2008 - Jun 2008)page 77 / post 3041-3080 (May 2008 - Jun 2008)page 78 / post 3081-3120 (May 2008)page 79 / post 3121-3160 (May 2008 - Jun 2008)page 80 / post 3161-3200 (May 2008 - Jun 2008)page 81 / post 3201-3240 (May 2008)page 82 / post 3241-3280 (May 2008 - Jun 2008)page 83 / post 3281-3320 (May 2008 - Jun 2008)page 87 / post 3441-3480 (May 2008)page 92 / post 3641-3680 (Jun 2008)page 101 / post 4001-4022 (Jun 2008) (new post)page 109 / post 4321-4345 (Jul 2008) (new post)page 116 / post 4601-4640 (Jul 2008 - Aug 2008) (evidence of post # decreasing by 1)page 117 / post 4641-4680 (Aug 2008)page 146 / post 5801-5840 (Sep 2008, multiple snapshots)page 171 / post 6801-6806 (Oct 2008) (new post)page 207 / post 8241-8255 (Dec 2008) (new post)page 250 / post 9961-10000 (Feb 2010)page 330 / post 13161-13181 (Aug 2010) (new post) Edited December 24, 2018 by Planetperson 4 Quote July 2009 Comic Scans | MNOLG Soundtrack Official Greg Discussion Weekly Digest | My Lego Network Link to comment Share on other sites More sharing options...
maxim21 Posted December 22, 2018 Share Posted December 22, 2018 (At least) one page with missings posts is present on wayback archive. It's page 45 : Great Archives | Wayback Archive The difference (2 messages) is around post #1783. I have no idea why they aren't in the archive, there doesn't seem to be anything abnormal about them. 1 Quote Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.-- Greg Farshtey Link to comment Share on other sites More sharing options...
Planetperson Posted December 22, 2018 Author Share Posted December 22, 2018 (At least) one page with missings posts is present on wayback archive.It's page 45 : Great Archives | Wayback Archive The difference (2 messages) is around post #1783. I have no idea why they aren't in the archive, there doesn't seem to be anything abnormal about them. Excellent, that helps! It's hard to find direct evidence of posts being deleted. There's even another post quoting one of the posts that was deleted. If only we could see whether those posts still existed in 2013. If they did, it would prove that the text file is missing content. Quote July 2009 Comic Scans | MNOLG Soundtrack Official Greg Discussion Weekly Digest | My Lego Network Link to comment Share on other sites More sharing options...
maxim21 Posted December 23, 2018 Share Posted December 23, 2018 I guess the only way to really know would be to get an archive from the BZP admins. It might be worth a try to ask them. Aside from that, I checked several other archiving sites (Common Crawl & Archive.is) and none seems to have enough data to say if these posts still existed at the time of the crawl. 1 Quote Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.-- Greg Farshtey Link to comment Share on other sites More sharing options...
Planetperson Posted December 24, 2018 Author Share Posted December 24, 2018 (edited) Honestly, it would be really helpful just to have a complete list of all of the OGD snapshots that are available. I can start a list in the first post. Searching for http://www.bzpower.com:80/forum/index.php?showtopic=275890* with a star at the end is a good starting point, but for some reason, it does not return all of the snapshots that are available on web.archive.org. It's possible to find snapshots that do not appear in the search results. Edited December 24, 2018 by Planetperson Quote July 2009 Comic Scans | MNOLG Soundtrack Official Greg Discussion Weekly Digest | My Lego Network Link to comment Share on other sites More sharing options...
maxim21 Posted December 24, 2018 Share Posted December 24, 2018 (edited) I began making a list of all the most recent ones and then saw you updated your first post with really every single snapshot. I will only list what I found that isn't in your list, but only the most recent snapshot, since several pages have dozens of snapshots.page 5 / post 161-200 (May 2008)page 8 / post 281-320 (May 2008)page 21 / post 801-840 (Jun 2009)page 23 / post 881-920 (Jun 2009)page 32 / post 1241-1280 (Jun 2009)page 37 / post 1441-1480 (Jun 2009)page 101 / post 4001-4040 (Jul 2008)page 112 / post 4441-4480 (Oct 2008)page 192 / post 7641-7641 (Jun 2009) (new post)page 197 / post 7841-7880 (Dec 2008)Also, for pages 197 & 207, they are december 2008 snapshots, which mean they are not in the backup. (There is a gap of several months at the end of 2008/beginning of 2009 due to a BZP crash). Edited December 24, 2018 by maxim21 1 Quote Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.-- Greg Farshtey Link to comment Share on other sites More sharing options...
Iruini Nuva Posted December 30, 2018 Share Posted December 30, 2018 Minor tangent. You have no idea how happy your choice of domain name makes me. Nice call. Carry on. 1 Quote Makuta: Consumed By Light • Rebrick Entry • Topic & Backstory • Blog ----------------- 2015 Sets: 18/18 + 3 • Polybags: 1/2 • SDCC x2, NYCC Clear MoF, Trans-MoF 2016 Sets: 17/17 + 6 Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.