make a script that will make csv file acc_avail_20260404.csv with cols
pmid dir
folder .cache/pubmed has files. remove the extension → this is the pmid value then look in either folder pmc (.xml), scidownl (.pdf), or playwright (*.pdf)
find a file in one of the 3 folders which if the extension is removed matches the pmid then for dir the value is the folder name. there should only be one answer. if multiple, join dir value with vertical bar “scidownl|playwright” example. leave pmids still missing a dir blank in the csv.
in playwright there are files like PMID_NONE.pdf for the dir value these will be NONE