Jump to content

Missing Posts in OGD Archive


Planetperson

Recommended Posts

Hey everyone,
 
I'm hoping that you can help me solve a little mystery. If you don't remember, I run a site called the "Official Greg Discussion Archive" (https://greg.thegreatarchives.com/), where I reconstructed the contents of the long-lost Official Greg Discussion topics from a series of text files saved by fishers64. Those text files used to be available for download here, although it looks like they are no longer available. I can provide them if necessary.

 

My problem is this: in the second OGD topic (the "Official Greg Dialogue"), the text file only contains enough posts for 333 pages, but the original topic definitely had 334. The number 334 is corroborated by the last entry of the Farshtey Feed, as well as snapshots from web.archive.org as late as 2013. I'm trying very hard to determine whether there is missing content in the text file, the formatting of the text file is inconsistent, the code I use to process the file is incorrect, or something else. Most importantly, I'm trying to determine if there is something I can do to fix it.

 

I have tried to figure out the underlying cause of this discrepancy from a number of different angles, but I haven't been able to come up with an explanation. Here is what I know that it isn't:

  • Posts being deleted over time. If you look at web.archive.org snapshots of the first OGD topic at different points in time, you can see that a number of spammy posts were deleted throughout the years (by admins? I don't know.). However, as I mentioned before, the second topic still had 334 pages as of 2013, which is around the time the text file was created, so that does not explain it.
  • The loss of all data between late 2008 and early 2009. Again, the fact that the topic had 334 pages in 2013 disproves this.
  • Spurious "NEW POST" markers in the text file. The text file marks new posts with the text "NEW POST". There are some spurious "NEW POST" markers that have no content after them, and my code discards those. However, there are only 6 of these, which is not enough to reach 334 pages, so this does not explain it.

Here are some examples of discrepancies between my archive and the snapshots on web.archive.org. The format is:

<archive post #> = <original post #> (<difference between the two>) (<url> | <url>).

Any ideas? I would be happy to give anyone who helps me figure this out credits on the site.

 

BTW, I have a number of enhancements for the site planned. Stay tuned.

 

Here is a working list of snapshots on web.archive.org. If you come across any new ones, please let me know.

Edited by Planetperson
  • Upvote 4
Link to comment
Share on other sites

(At least) one page with missings posts is present on wayback archive.

It's page 45 : Great Archives | Wayback Archive

 

The difference (2 messages) is around post #1783. I have no idea why they aren't in the archive, there doesn't seem to be anything abnormal about them.

  • Upvote 1

Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.

-- Greg Farshtey

Link to comment
Share on other sites

(At least) one page with missings posts is present on wayback archive.

It's page 45 : Great Archives | Wayback Archive

 

The difference (2 messages) is around post #1783. I have no idea why they aren't in the archive, there doesn't seem to be anything abnormal about them.

 

Excellent, that helps! It's hard to find direct evidence of posts being deleted. There's even another post quoting one of the posts that was deleted. If only we could see whether those posts still existed in 2013. If they did, it would prove that the text file is missing content.

Link to comment
Share on other sites

I guess the only way to really know would be to get an archive from the BZP admins. It might be worth a try to ask them.

 

Aside from that, I checked several other archiving sites (Common Crawl & Archive.is) and none seems to have enough data to say if these posts still existed at the time of the crawl.

  • Upvote 1

Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.

-- Greg Farshtey

Link to comment
Share on other sites

Honestly, it would be really helpful just to have a complete list of all of the OGD snapshots that are available. I can start a list in the first post.

 

Searching for http://www.bzpower.com:80/forum/index.php?showtopic=275890* with a star at the end is a good starting point, but for some reason, it does not return all of the snapshots that are available on web.archive.org. It's possible to find snapshots that do not appear in the search results.

Edited by Planetperson
Link to comment
Share on other sites

I began making a list of all the most recent ones and then saw you updated your first post with really every single snapshot. I will only list what I found that isn't in your list, but only the most recent snapshot, since several pages have dozens of snapshots.

Also, for pages 197 & 207, they are december 2008 snapshots, which mean they are not in the backup. (There is a gap of several months at the end of 2008/beginning of 2009 due to a BZP crash). Edited by maxim21
  • Upvote 1

Keep in mind that if Star Trek fans had, as a group, said, "No point in talking about this anymore, it's never going to come back," it never WOULD have come back.

-- Greg Farshtey

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...