List Sharepoint documents through REST API and python

Albert uses Sharepoint 2013 to store several GB of documents.

Albert needs to be able to list and possibly extract the documents in a comuter readable format, like ls would do on Unix. Of Course, he is a bit in a rush and has nothing but a corporate laptop without much privileges to do the job….

To make things simpler, although the Sharepoint site is hosted by his company, he has only a standard user web access and no specific provilege on the server or a powershell environment that would allow to interrogate it.

What’s next ?

Using Python, we can quite simply rely on Sharepoint REST API to perform basic queries.

Setup

Use winpyton

A portable Pyhton

Installing Python 3 and needed libraries may be tricky in a corporate environment, in particular when the web is accessed through a NTLM proxy that also rewrites SSL certificates.

I ended up installing Winpython (portable environment that bundles a recent PIP version).

using pip to access pypi.org behind a proxy

By default something like pip --proxy=http://user:pass@proxyAddress:proxyPort
should do the trick.
I rely on a local CNTML proxy to authenticate my requests to the corporate proxy, as this avoid having to manage the account/password  in the commands
pip --proxy=http://localhost:3128 install my_special_package

However in my case the proxy rewrites SSL certificates and sign them with a corporate cert. I need to tell pip to ignore pypi.org certificate validation.

C:\WinPython-64bit-3.4.4.2\scripts>pip --proxy=http://localhost:3128 --trusted-host pypi.python.org install my_special_package

Ideally we should be able to tell pip to trust the corporate certifcate (did not work in my case though, possibly due to a problem of cascading certs.):

pip --cert /etc/ssl/certs/FOO_Root_CA.pem install my_special_package

Installing dependencies

We will also need to authicate through NTLM to access the Sharepoint server(basic auth as shown in most online sample does not work here, we get error 401).

pip --proxy=http://localhost:3128 --trusted-host pypi.python.org install requests
# requests_ntlm package to authenticate http requests
pip --proxy=http://localhost:3128 --trusted-host pypi.python.org install requests_ntlm
# sharepoint lib... that did not work though
pip --proxy=http://localhost:3128 --trusted-host pypi.python.org install sharepoint

the script

import requests
import json
from requests.auth import HTTPBasicAuth
from requests_ntlm import HttpNtlmAuth
import getpass

# 24-06-2016 - olivier de Meringo... and a lot of nice guys on the web.
#
# Experimental listing files and directories and some metadata from a sharepoint 2013 site.
# More or less like ls-al on sharepoint where you only have non-admin access (i.e. only REST API).


# ITEMS = "http://mysite.corp/site/subsitename/_api/lists/getbytitle('ESD repository')/items"
# SITE = "http://http://mysite.corp/site/subsitename/_api/web/getfilebyserverrelativeurl('/')"
#USER_FOLDERS= "http://mysite.corp/site/subsitename/_api/web/getfolderbyserverrelativeurl('/site/subsitename/Documents')/Folders"

MYDOC_REPOSITORY = "http://mysite.corp/site/subsitename/_api/web/getfolderbyserverrelativeurl('/site/subsitename/Documents')"


# User name with nt domain in case of NTLM (USERNAME = "mydomain\\mylogin")
USERNAME = "mydomain\\mylogin"
# prompt password
PASSWORD = getpass.getpass('Password for '+USERNAME+':')




def get_json_ntlm(query):
headers = {'accept': 'application/json;odata=verbose'}
r = requests.get(query, auth=HttpNtlmAuth(USERNAME,PASSWORD), headers=headers)
return r

# display json in human readable format
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None



def print_files_recursivelly(folderURI):
# get current folder as json
folderJson = get_json_ntlm(folderURI).json()

# gets the URI that describes the files of this folder
filesURI = folderJson["d"]["Files"]["__deferred"]["uri"]
print_files_detail(filesURI)

# Continue on subfolders
subURI = folderJson["d"]["Folders"]["__deferred"]["uri"]
subfolders = get_json_ntlm(subURI).json()
for sub in subfolders["d"]["results"]:
print_files_recursivelly (sub["__metadata"]["uri"])
return None

def print_files_detail(filesURI):
files = get_json_ntlm(filesURI).json()
for afile in files["d"]["results"]:
print(afile["Name"]+ ";"+ afile["ServerRelativeUrl"]+";"+ afile["Length"] +";"+ afile["TimeCreated"] +";"+ afile["TimeLastModified"] +";")
return None

def print_folder_detail(folderJson):
# print it's name and relative URL
print ( "{0} ({1} child(s))".format(folderJson["d"]["ServerRelativeUrl"], folderJson["d"]["ItemCount"]))
return None

def print_folder_hierarchy(folderURI):
# get the folder as json
folderJson = get_json_ntlm(folderURI).json()
print_folder_detail(folderJson)

# gets children files
# filesURI = folderJsonRes["d"]["Files"]["__deferred"]["uri"]
# print_files_detail(filesURI)

# get subfolders uri
subURI = folderJson["d"]["Folders"]["__deferred"]["uri"]
subfolders = get_json_ntlm(subURI).json()
for sub in subfolders["d"]["results"]:
print_folder_hierarchy (sub["__metadata"]["uri"])
return None

print_folder_hierarchy(MYDOC_REPOSITORY)
print("-------------------------------------------------")
print_files_recursivelly(MYDOC_REPOSITORY)

quick notes – improvements

I should rather use python-sharepoint library

Python-sharepoint library is expected abstract the REST API but I did not manage to use it, possibly due to a problem with NTLM opener.

See [https://github.com/ox-it/python-sharepoint/issues/3]

Use CAML to filter queries

REST API can use CAML to reduce and optimize queries. However, it will requiere that REST queries are send through POST (instead of GET) which may conplexify the code…. but definitly somehing to explore.

See
– [http://sharepoint.stackexchange.com/questions/78612/why-do-i-have-to-use-post-for-rest-queries]
– [https://chuvash.eu/2014/03/25/using-caml-with-sharepoint-rest-api/]

Json format reference for files and folders

  • Files collections: [https://msdn.microsoft.com/en-us/library/office/dn450841.aspx#bk_FileCollection]
  • Folder collections: [https://msdn.microsoft.com/en-us/library/office/dn450841.aspx#bk_FolderCollection]
  • Folders properties: [https://msdn.microsoft.com/en-us/library/office/dn450841.aspx#bk_FolderProperties]

usefull queries

Get All Lists “`http://server/site/_api/lists“

Get All List Items From a Single List`:  http://server/site/_api/lists/getbytitle(‘listname’)/items

Get a Single List Item http://server/site/_api/lists/getbytitle(‘listname’)/items
Get Back Certain Columns http://server/site/_api/lists/getbytitle(‘listname’)/items?$select=Title,Id
Order Your Results  ““http://server/site/_api/lists/getbytitle(‘listname’)/items?$orderby=Title“`

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.