@@ -357,6 +357,146 @@ takes a list of columns to sort by.
357357 tips = tips.sort_values([' sex' , ' total_bill' ])
358358 tips.head()
359359
360+
361+ String Processing
362+ -----------------
363+
364+ Length
365+ ~~~~~~
366+
367+ SAS determines the length of a character string with the
368+ `LENGTHN <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002284668.htm >`__
369+ and `LENGTHC <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002283942.htm >`__
370+ functions. ``LENGTHN `` excludes trailing blanks and ``LENGTHC `` includes trailing blanks.
371+
372+ .. code-block :: none
373+
374+ data _null_;
375+ set tips;
376+ put(LENGTHN(time));
377+ put(LENGTHC(time));
378+ run;
379+
380+ Python determines the length of a character string with the ``len `` function.
381+ ``len `` includes trailing blanks. Use ``len `` and ``rstrip `` to exclude
382+ trailing blanks.
383+
384+ .. ipython :: python
385+
386+ tips[' time' ].str.len().head()
387+ tips[' time' ].str.rstrip().str.len().head()
388+
389+
390+ Find
391+ ~~~~
392+
393+ SAS determines the position of a character in a string with the
394+ `FINDW <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002978282.htm >`__ function.
395+ ``FINDW `` takes the string defined by the first argument and searches for the first position of the substring
396+ you supply as the second argument.
397+
398+ .. code-block :: none
399+
400+ data _null_;
401+ set tips;
402+ put(FINDW(sex,'ale'));
403+ run;
404+
405+ Python determines the position of a character in a string with the
406+ ``find `` function. ``find `` searches for the first position of the
407+ substring. If the substring is found, the function returns its
408+ position. Keep in mind that Python indexes are zero-based and
409+ the function will return -1 if it fails to find the substring.
410+
411+ .. ipython :: python
412+
413+ tips[' sex' ].str.find(" ale" ).head()
414+
415+
416+ Substring
417+ ~~~~~~~~~
418+
419+ SAS extracts a substring from a string based on its position with the
420+ `SUBSTR <http://www2.sas.com/proceedings/sugi25/25/cc/25p088.pdf >`__ function.
421+
422+ .. code-block :: none
423+
424+ data _null_;
425+ set tips;
426+ put(substr(sex,1,1));
427+ run;
428+
429+ With pandas you can use ``[] `` notation to extract a substring
430+ from a string by position locations. Keep in mind that Python
431+ indexes are zero-based.
432+
433+ .. ipython :: python
434+
435+ tips[' sex' ].str[0 :1 ].head()
436+
437+
438+ Scan
439+ ~~~~
440+
441+ The SAS `SCAN <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm >`__
442+ function returns the nth word from a string. The first argument is the string you want to parse and the
443+ second argument specifies which word you want to extract.
444+
445+ .. code-block :: none
446+
447+ data firstlast;
448+ input String $60.;
449+ First_Name = scan(string, 1);
450+ Last_Name = scan(string, -1);
451+ datalines2;
452+ John Smith;
453+ Jane Cook;
454+ ;;;
455+ run;
456+
457+ Python extracts a substring from a string based on its text
458+ by using regular expressions. There are much more powerful
459+ approaches, but this just shows a simple approach.
460+
461+ .. ipython :: python
462+
463+ firstlast = pd.DataFrame({' String' : [' John Smith' , ' Jane Cook' ]})
464+ firstlast[' First_Name' ] = firstlast[' String' ].str.split(" " , expand = True )[0 ]
465+ firstlast[' Last_Name' ] = firstlast[' String' ].str.rsplit(" " , expand = True )[0 ]
466+ firstlast
467+
468+
469+ Upcase, Lowcase, and Propcase
470+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
471+
472+ The SAS `UPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245965.htm >`__
473+ `LOWCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245912.htm >`__ and
474+ `PROPCASE <http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a002598106.htm >`__
475+ functions change the case of the argument.
476+
477+ .. code-block :: none
478+
479+ data firstlast;
480+ input String $60.;
481+ string_up = UPCASE(string);
482+ string_low = LOWCASE(string);
483+ string_prop = PROPCASE(string);
484+ datalines2;
485+ John Smith;
486+ Jane Cook;
487+ ;;;
488+ run;
489+
490+ The equivalent Python functions are ``upper ``, ``lower ``, and ``title ``.
491+
492+ .. ipython :: python
493+
494+ firstlast = pd.DataFrame({' String' : [' John Smith' , ' Jane Cook' ]})
495+ firstlast[' string_up' ] = firstlast[' String' ].str.upper()
496+ firstlast[' string_low' ] = firstlast[' String' ].str.lower()
497+ firstlast[' string_prop' ] = firstlast[' String' ].str.title()
498+ firstlast
499+
360500 Merging
361501-------
362502
0 commit comments