PIUG 2021 - Wednesday

14

PIUG 2021  WEDNESDAY  Virtual Meeting

 W E D N E S D A Y A F T E R N O O N 12 : 4 5 P M – 4 : 0 0 P M

Reviewing the Patent-Extracted Chemistry “Big Bang” in PubChem Chris Southan (TW2Informatics) 2:40 – 3:15 pm  ABSTRACT After the first IBM deposition of 2.5 million in 2012, few would have predicted that PubChem patent-extraction submissions would have accumulated ~40 million compounds by 2021 and be indexed against patent document numbers. The four major automated submitters (CIDs counts in millions) are SureChEMBL (21), Google Patents (18), WIPO (17) and IBM (11). Comparisons between sources indicates that prior-art coverage of drugs, clinical compounds and exemplified leads is now remarkably high. This means academics and SMES can nowmine patent chemistry without commercial sources but that large companies accessing these now also need to corroboratively search PubChem in parallel. This patent chemistry presents a big expansion in accessible SAR space along with the paradox that published patents are more open for text mining than the literature. However, automated extraction from documents is associated with challenges. These include a) the in-situ query interfaces for Google Patents, WIPO Patentscope and SureChEMBL have a different look and feel, b) these three PubChem sources are puzzlingly divergent in their extracted chemistry, c) database entries cannot directly indicate IP status d) structural quality and coverage is uneven and e) there is an over-indexing problem of common chemicals against many patent documents.  BIOGRAPHY Chris Southan has a PhD from the LMU Munich, MSc in Virology from Reading and a BSc Hons in Biochemistry from Dundee. His background in protein chemistry, bioinformatics, cheminformatics, drug discovery and pharmacology was acquired in both commercial and academic roles. Recent positions include Senior Cheminformatian for the Edinburgh University BPS/IUPHAR Guide to Pharmacology (2013–18 and 2020). He is the owner of TW2Informaics for consulting work (2011–12, 2019–20) and was a contractor for the AstraZeneca Knowledge Engineering Program (2009–11) and ELIXIR Database Provider Survey Coordinator at the EBI (2008–9). These were preceded by a Principal Scientist and Bioinformatics Team Leader position in AstraZeneca, Sweden (2004–7) and senior bioinformatician positions in the UK at Oxford Glycosciences Gemini Genomics and SmithKline Beecham. (see LinkedIN for details)

Made with FlippingBook PDF to HTML5