U Ú²gð:ã@s<dZddlZGdd„deƒZGdd„dƒZGdd„dƒZdS) zÇ This module contains a tokenizer for Excel formulae. The tokenizer is based on the Javascript tokenizer found at http://ewbi.blogs.com/develops/2004/12/excel_formula_p.html written by Eric Bachtal éNc@seZdZdZdS)ÚTokenizerErrorz$Base class for all Tokenizer errors.N)Ú__name__Ú __module__Ú__qualname__Ú__doc__©rrú>/tmp/pip-unpacked-wheel-dtlj1ams/openpyxl/formula/tokenizer.pyrsrc@s´eZdZdZe d¡Ze d¡Ze d¡e d¡dœZdZ dZ d d „Zdd„Zd d„Z dd„Zdd„Zdd„Zdd„Zdd„Zdd„Zdd„Zdd„Zd'd d!„Zd"d#„Zd$d%„Zd&S)(Ú Tokenizera^ A tokenizer for Excel worksheet formulae. Converts a str string representing an Excel formula (in A1 notation) into a sequence of `Token` objects. `formula`: The str string to tokenize Tokenizer defines a method `._parse()` to parse the formula into tokens, which can then be accessed through the `.items` attribute. z^[1-9](\.[0-9]+)?[Ee]$z[ \n]+z"(?:[^"]*"")*[^"]*"(?!")z'(?:[^']*'')*[^']*'(?!')©ú"ú')z#NULL!z#DIV/0!z#VALUE!z#REF!z#NAME?z#NUM!z#N/Az #GETTING_DATAz,;}) +-*/^&=><%cCs*||_g|_g|_d|_g|_| ¡dS)Nr)ÚformulaÚitemsÚtoken_stackÚoffsetÚtokenÚ_parse)Úselfr rrrÚ__init__.szTokenizer.__init__c Cs>|jr dS|jsdS|jddkr2|jd7_n|j t|jtjƒ¡dSd|jfd|jfd|jfd|j fd |j fd |j fd|jfd|jfd |j ff }i}|D]\}}| t ||¡¡q |jt|jƒkr2| ¡rØq¼|j|j}||jkrö| ¡||kr|j||ƒ7_q¼|j |¡|jd7_q¼| ¡dS)z5Populate self.items with the tokens from the formula.Nrú=éz"'ú[ú#ú Ú z +-*/^&=><%z{(ú)}z;,)rr rÚappendÚTokenÚLITERALÚ _parse_stringÚ_parse_bracketsÚ_parse_errorÚ_parse_whitespaceÚ_parse_operatorÚ _parse_openerÚ _parse_closerÚ_parse_separatorÚupdateÚdictÚfromkeysÚlenÚcheck_scientific_notationÚTOKEN_ENDERSÚ save_tokenr)rZ consumersÚ dispatcherÚcharsZconsumerÚ curr_charrrrr7s@÷ zTokenizer._parsecCs¬|jdd|j|j}|dks$t‚|j|}| |j|jd…¡}|dkrr|dkrXdnd}td|›d |j›ƒ‚| d ¡}|dkr˜|j t |¡¡n|j |¡t |ƒS)a¹ Parse a "-delimited string or '-delimited link. The offset must be pointing to either a single quote ("'") or double quote ('"') character. The strings are parsed according to Excel rules where to escape the delimiter you just double it up. E.g., "abc""def" in Excel is parsed as 'abc"def' in Python. Returns the number of characters matched. (Does not update self.offset) ú:©Ú can_followr NrÚstringÚlinkz%Reached end of formula while parsing z in r)Úassert_empty_tokenr rÚAssertionErrorÚSTRING_REGEXESÚmatchrÚgrouprrrÚmake_operandrr*)rÚdelimÚregexr9Úsubtyperrrr_s zTokenizer._parse_stringcCsÄ|j|jdkst‚dd„t d|j|jd…¡Dƒ}dd„t d|j|jd…¡Dƒ}d}t||ƒD]F\}}||7}|dkrh|d }|j |j|j|j|…¡|Sqhtd |j›ƒ‚dS)zœ Consume all the text between square brackets []. Returns the number of characters matched. (Does not update self.offset) rcSsg|]}| ¡df‘qS)r©Ústart©Ú.0ÚtrrrÚ „sz-Tokenizer._parse_brackets..z\[NcSsg|]}| ¡df‘qS)éÿÿÿÿr?rArrrrD†sz\]rrzEncountered unmatched '[' in ) r rr7ÚreÚfinditerÚsortedrrr)rZleftsZrightsZ open_countÚidxZ open_closeZouter_rightrrrr {s"ÿÿÿ zTokenizer._parse_bracketscCsš|jdd|j|jdks t‚|j|jd…}|jD]D}| |¡r6|j t d |j¡|¡¡|jdd…=t|ƒSq6t d|j›d|j›dƒ‚dS) zÃ Consume the text following a '#' as an error. Looks for a match in self.ERROR_CODES and returns the number of characters matched. (Does not update self.offset) ú!r2rNÚzInvalid error code at position ú in 'r)r6r rr7ÚERROR_CODESÚ startswithrrrr;Újoinrr*r)rZ subformulaÚerrrrrr!”s zTokenizer._parse_errorcCsL|j|jdkst‚|j t|j|jtjƒ¡|j |j|jd…¡ ¡S)z† Consume a string of consecutive spaces. Returns the number of spaces found. (Does not update self.offset). )rrN) r rr7rrrÚWSPACEÚ WSPACE_REr9Úend©rrrrr"¦szTokenizer._parse_whitespacecCs|j|j|jd…dkrD|j t|j|j|jd…tjƒ¡dS|j|j}|dks\t‚|dkrrtdtjƒ}nŠ|dkrˆt|tjƒ}nt|jsœt|tjƒ}n`t dd„t |jƒDƒdƒ}|oÜ|jtjkpÜ|j tjkpÜ|j tjk}|rðt|tjƒ}nt|tjƒ}|j |¡d S) z Consume the characters constituting an operator. Returns the number of characters consumed. (Does not update self.offset) é)z>=z<=z<>z %*/^&=><+-ú%z*/^&=>Ésÿz,Tokenizer._parse_operator..Nr)r rrrrÚOP_INr7ÚOP_POSTÚOP_PREÚnextÚreversedr>ÚCLOSErXÚOPERAND)rr0rÚprevZis_infixrrrr#±s8þÿ ÿ ýzTokenizer._parse_operatorcCsŒ|j|jdkst‚|j|jdkr8| ¡t d¡}n8|jrfd |j¡d}|jdd…=t |¡}n t d¡}|j |¡|j |¡dS)z‰ Consumes a ( or { character. Returns the number of characters consumed. (Does not update self.offset) )ú(Ú{rdrKrcNr)r rr7r6rÚmake_subexprrOrrr)rrZtoken_valuerrrr$×s zTokenizer._parse_openercCsR|j|jdkst‚|j ¡ ¡}|j|j|jkrBtd|jƒ‚|j |¡dS)z‰ Consumes a } or ) character. Returns the number of characters consumed. (Does not update self.offset) )ú)Ú}zMismatched ( and { pair in '%s'r) r rr7rÚpopÚ get_closerÚvaluerrr)rrrrrr%ísÿzTokenizer._parse_closercCs|j|j}|dkst‚|dkr,t d¡}nTz|jdj}Wn tk r\tdtjƒ}Yn$X|tj krvtdtjƒ}n t d¡}|j |¡dS)z‰ Consumes a ; or , character. Returns the number of characters consumed. (Does not update self.offset) )ú;ú,rkrErlr)r rr7rÚmake_separatorrrXÚ IndexErrorr[ÚPARENrr)rr0rZtop_typerrrr&ýs zTokenizer._parse_separatorcCsX|j|j}|dkrTt|jƒdkrT|j d |j¡¡rT|j |¡|jd7_dSdS)z¾ Consumes a + or - character if part of a number in sci. notation. Returns True if the character was consumed and self.offset was updated, False otherwise. z+-rrKTF)r rr*rÚSN_REr9rOr)rr0rrrr+sÿþz#Tokenizer.check_scientific_notationrcCs2|jr.|jd|kr.td|j›d|j›dƒ‚dS)a: Ensure that there's no token currently being parsed. Or if there is a token being parsed, it must end with a character in can_follow. If there are unconsumed token contents, it means we hit an unexpected token transition. In this case, we raise a TokenizerError rEz!Unexpected character at position rLrN)rrrr )rr3rrrr6'szTokenizer.assert_empty_tokencCs0|jr,|j t d |j¡¡¡|jdd…=dS)z9If there's a token being parsed, add it to the item list.rKN)rrrrr;rOrTrrrr-5szTokenizer.save_tokencCsB|js dS|jdjtjkr(|jdjSdd dd„|jDƒ¡S)z+Convert the parsed tokens back to a string.rKrrcss|]}|jVqdSrW)rj)rBrrrrrZAsz#Tokenizer.render..)rrXrrrjrOrTrrrÚrender;s zTokenizer.renderN)r)rrrrrFÚcompilerprRr8rMr,rrrr r!r"r#r$r%r&r+r6r-rqrrrrr s, ú (& r c@s¦eZdZdZdddgZdZdZdZdZd Z d Z dZdZd Z dZd'dd„ZdZdZdZdZdZdd„Zedd„ƒZdZdZed(dd„ƒZd d!„Zd"Zd#Zed$d%„ƒZd&S))ra) A token in an Excel formula. Tokens have three attributes: * `value`: The string value parsed that led to this token * `type`: A string identifying the type of token * `subtype`: A string identifying subtype of the token (optional, and defaults to "") rjrXr>rraÚFUNCÚARRAYroÚSEPzOPERATOR-PREFIXzOPERATOR-INFIXzOPERATOR-POSTFIXzWHITE-SPACErKcCs||_||_||_dSrW)rjrXr>)rrjÚtype_r>rrrr_szToken.__init__ÚTEXTÚNUMBERÚLOGICALÚERRORÚRANGEcCsd |j|j|j¡S)Nz{0} {1} {2}:)ÚformatrXr>rjrTrrrÚ__repr__qszToken.__repr__cCsp| d¡r|j}nP| d¡r$|j}n>|dkr4|j}n.zt|ƒ|j}Wntk r`|j}YnX|||j|ƒS)zCreate an operand token.rr)ÚTRUEÚFALSE) rNrwrzryÚfloatrxÚ ValueErrorr{ra©Úclsrjr>rrrr;ts zToken.make_operandÚOPENr`FcCsr|ddkst‚|r,t d|¡s$t‚tj}n&|dkrrrrre‘s zToken.make_subexpcCsT|j|j|j|jfkst‚|j|jks*t‚|j|jkr:dnd}|j||j|jkdS)z6Return a closing token that matches this token's type.rgrf)r…)rXrsrtror7r>r„re)rrjrrrri§szToken.get_closerÚARGÚROWcCs.|dkst‚|dkr|jn|j}|||j|ƒS)zCreate a separator token)rlrkrl)r7r†r‡rur‚rrrrm¹szToken.make_separatorN)rK)F)rrrrÚ __slots__rrarsrtrorur]r[r\rQrrwrxryrzr{r}Úclassmethodr;r„r`rerir†r‡rmrrrrrDs< r)rrFÚ Exceptionrr rrrrrÚs6