{"id":323,"date":"2016-06-24T16:51:52","date_gmt":"2016-06-24T14:51:52","guid":{"rendered":"http:\/\/le-moulin-de-verre.com\/fieldnotes\/?p=323"},"modified":"2016-06-24T16:51:52","modified_gmt":"2016-06-24T14:51:52","slug":"list-sharepoint-documents-through-rest-api-and-python","status":"publish","type":"post","link":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/?p=323","title":{"rendered":"List Sharepoint documents through REST API and python"},"content":{"rendered":"<p>Albert uses Sharepoint 2013 to store several GB of documents.<\/p>\n<p>Albert needs to be able to list and possibly extract the documents in a comuter readable format, like <code>ls<\/code> would do on Unix. Of Course, he is a bit in a rush and has nothing but a corporate laptop without much privileges to do the job&#8230;.<\/p>\n<p>To make things simpler, although the Sharepoint site is hosted by his company, he has only a standard user web access and no specific provilege on the server or a powershell environment that would allow to interrogate it.<\/p>\n<p>What&#8217;s next ?<\/p>\n<p>Using Python, we can quite simply rely on Sharepoint REST API to perform basic queries.<\/p>\n<h1>Setup<\/h1>\n<h2>Use winpyton<\/h2>\n<h2>A portable Pyhton<\/h2>\n<p>Installing Python 3 and needed libraries may be tricky in a corporate environment, in particular when the web is accessed through a NTLM proxy that also rewrites SSL certificates.<\/p>\n<p>I ended up installing Winpython (portable environment that bundles a recent PIP version).<\/p>\n<h3>using pip to access pypi.org behind a proxy<\/h3>\n<p>By default something like <code>pip --proxy=http:\/\/user:pass@proxyAddress:proxyPort<\/code><br \/>\nshould do the trick.<br \/>\nI rely on a local CNTML proxy to authenticate my requests to the corporate proxy, as this avoid having to manage the account\/password\u00a0 in the commands<br \/>\n<code>pip --proxy=http:\/\/localhost:3128 install my_special_package<\/code><\/p>\n<p>However in my case the proxy rewrites SSL certificates and sign them with a corporate cert. I need to tell pip to ignore pypi.org certificate validation.<\/p>\n<pre><code>C:\\WinPython-64bit-3.4.4.2\\scripts&gt;pip --proxy=http:\/\/localhost:3128 --trusted-host pypi.python.org install my_special_package\n<\/code><\/pre>\n<p>Ideally we should be able to tell pip to trust the corporate certifcate (did not work in my case though, possibly due to a problem of cascading certs.):<\/p>\n<pre><code>pip --cert \/etc\/ssl\/certs\/FOO_Root_CA.pem install my_special_package\n<\/code><\/pre>\n<h3>Installing dependencies<\/h3>\n<p>We will also need to authicate through NTLM to access the Sharepoint server(basic auth as shown in most online sample does not work here, we get error 401).<\/p>\n<pre><code>pip --proxy=http:\/\/localhost:3128 --trusted-host pypi.python.org install requests\n# requests_ntlm package to authenticate http requests\npip --proxy=http:\/\/localhost:3128 --trusted-host pypi.python.org install requests_ntlm\n# sharepoint lib... that did not work though\npip --proxy=http:\/\/localhost:3128 --trusted-host pypi.python.org install sharepoint\n<\/code><\/pre>\n<h1>the script<\/h1>\n<pre><code class=\"python\">import requests\nimport json\nfrom requests.auth import HTTPBasicAuth\nfrom requests_ntlm import HttpNtlmAuth\nimport getpass\n\n# 24-06-2016 - olivier de Meringo... and a lot of nice guys on the web.\n#\n# Experimental listing files and directories and some metadata from a sharepoint 2013 site.\n# More or less like ls-al on sharepoint where you only have non-admin access (i.e. only REST API).\n\n\n# ITEMS = \"http:\/\/mysite.corp\/site\/subsitename\/_api\/lists\/getbytitle('ESD repository')\/items\"\n# SITE = \"http:\/\/http:\/\/mysite.corp\/site\/subsitename\/_api\/web\/getfilebyserverrelativeurl('\/')\"\n#USER_FOLDERS= \"http:\/\/mysite.corp\/site\/subsitename\/_api\/web\/getfolderbyserverrelativeurl('\/site\/subsitename\/Documents')\/Folders\"\n\nMYDOC_REPOSITORY = \"http:\/\/mysite.corp\/site\/subsitename\/_api\/web\/getfolderbyserverrelativeurl('\/site\/subsitename\/Documents')\"\n\n\n# User name with nt domain in case of NTLM (USERNAME = \"mydomain\\\\mylogin\")\nUSERNAME = \"mydomain\\\\mylogin\"\n# prompt password\nPASSWORD = getpass.getpass('Password for '+USERNAME+':')\n\n\n\n\ndef get_json_ntlm(query):\nheaders = {'accept': 'application\/json;odata=verbose'}\nr = requests.get(query, auth=HttpNtlmAuth(USERNAME,PASSWORD), headers=headers)\nreturn r\n\n# display json in human readable format\ndef pp_json(json_thing, sort=True, indents=4):\nif type(json_thing) is str:\nprint(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))\nelse:\nprint(json.dumps(json_thing, sort_keys=sort, indent=indents))\nreturn None\n\n\n\ndef print_files_recursivelly(folderURI):\n# get current folder as json\nfolderJson = get_json_ntlm(folderURI).json()\n\n# gets the URI that describes the files of this folder\nfilesURI = folderJson[\"d\"][\"Files\"][\"__deferred\"][\"uri\"]\nprint_files_detail(filesURI)\n\n# Continue on subfolders\nsubURI = folderJson[\"d\"][\"Folders\"][\"__deferred\"][\"uri\"]\nsubfolders = get_json_ntlm(subURI).json()\nfor sub in subfolders[\"d\"][\"results\"]:\nprint_files_recursivelly (sub[\"__metadata\"][\"uri\"])\nreturn None\n\ndef print_files_detail(filesURI):\nfiles = get_json_ntlm(filesURI).json()\nfor afile in files[\"d\"][\"results\"]:\nprint(afile[\"Name\"]+ \";\"+ afile[\"ServerRelativeUrl\"]+\";\"+ afile[\"Length\"] +\";\"+ afile[\"TimeCreated\"] +\";\"+ afile[\"TimeLastModified\"] +\";\")\nreturn None\n\ndef print_folder_detail(folderJson):\n# print it's name and relative URL\nprint ( \"{0} ({1} child(s))\".format(folderJson[\"d\"][\"ServerRelativeUrl\"], folderJson[\"d\"][\"ItemCount\"]))\nreturn None\n\ndef print_folder_hierarchy(folderURI):\n# get the folder as json\nfolderJson = get_json_ntlm(folderURI).json()\nprint_folder_detail(folderJson)\n\n# gets children files\n# filesURI = folderJsonRes[\"d\"][\"Files\"][\"__deferred\"][\"uri\"]\n# print_files_detail(filesURI)\n\n# get subfolders uri\nsubURI = folderJson[\"d\"][\"Folders\"][\"__deferred\"][\"uri\"]\nsubfolders = get_json_ntlm(subURI).json()\nfor sub in subfolders[\"d\"][\"results\"]:\nprint_folder_hierarchy (sub[\"__metadata\"][\"uri\"])\nreturn None\n\nprint_folder_hierarchy(MYDOC_REPOSITORY)\nprint(\"-------------------------------------------------\")\nprint_files_recursivelly(MYDOC_REPOSITORY)\n\n<\/code><\/pre>\n<h1>quick notes &#8211; improvements<\/h1>\n<h2>I should rather use python-sharepoint library<\/h2>\n<p>Python-sharepoint library is expected abstract the REST API but I did not manage to use it, possibly due to a problem with NTLM opener.<\/p>\n<p>See [https:\/\/github.com\/ox-it\/python-sharepoint\/issues\/3]<\/p>\n<h2>Use CAML to filter queries<\/h2>\n<p>REST API can use CAML to reduce and optimize queries. However, it will requiere that REST queries are send through POST (instead of GET) which may conplexify the code&#8230;. but definitly somehing to explore.<\/p>\n<p>See<br \/>\n&#8211; [http:\/\/sharepoint.stackexchange.com\/questions\/78612\/why-do-i-have-to-use-post-for-rest-queries]<br \/>\n&#8211; [https:\/\/chuvash.eu\/2014\/03\/25\/using-caml-with-sharepoint-rest-api\/]<\/p>\n<h2>Json format reference for files and folders<\/h2>\n<ul>\n<li>Files collections: [https:\/\/msdn.microsoft.com\/en-us\/library\/office\/dn450841.aspx#bk_FileCollection]<\/li>\n<li>Folder collections: [https:\/\/msdn.microsoft.com\/en-us\/library\/office\/dn450841.aspx#bk_FolderCollection]<\/li>\n<li>Folders properties: [https:\/\/msdn.microsoft.com\/en-us\/library\/office\/dn450841.aspx#bk_FolderProperties]<\/li>\n<\/ul>\n<h2>usefull queries<\/h2>\n<p>Get All Lists &#8220;`http:\/\/server\/site\/_api\/lists&#8220;<\/p>\n<p>Get All List Items From a Single List`:\u00a0 <code>http:\/\/server\/site\/_api\/lists\/getbytitle(\u2018listname\u2019)\/items<\/code><\/p>\n<p>Get a Single List Item <code>http:\/\/server\/site\/_api\/lists\/getbytitle(\u2018listname\u2019)\/items<\/code><br \/>\nGet Back Certain Columns <code>http:\/\/server\/site\/_api\/lists\/getbytitle(\u2018listname\u2019)\/items?$select=Title,Id<\/code><br \/>\nOrder Your Results\u00a0 &#8220;&#8220;http:\/\/server\/site\/_api\/lists\/getbytitle(\u2018listname\u2019)\/items?$orderby=Title&#8220;`<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Albert uses Sharepoint 2013 to store several GB of documents. Albert needs to be able to list and possibly extract the documents in a comuter readable format, like ls would do on Unix. Of Course, he is a bit in a rush and has nothing but a corporate laptop without much privileges to do the &hellip; <a href=\"https:\/\/le-moulin-de-verre.com\/fieldnotes\/?p=323\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;List Sharepoint documents through REST API and python&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5WcEf-5d","_links":{"self":[{"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/posts\/323"}],"collection":[{"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=323"}],"version-history":[{"count":3,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/posts\/323\/revisions"}],"predecessor-version":[{"id":326,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=\/wp\/v2\/posts\/323\/revisions\/326"}],"wp:attachment":[{"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=323"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=323"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/le-moulin-de-verre.com\/fieldnotes\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=323"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}